WORD SENSE DISAMBIGUATION FOR TAMIL LANGUAGE USING PART-OF-SPEECH AND CLUSTERING TECHNIQUE

oleh: P. ISWARYA, V. RADHA

Format: Article
Diterbitkan: Taylor's University 2017-09-01

Deskripsi

Word sense disambiguation is an important task in Natural Language Processing (NLP), and this paper concentrates on the problem of target word selection in machine translation. The proposed method called enhanced Word Sense Disambiguation with Part-of-Speech and Clustering based Sensecollocation (WSDPCS) consists of two steps namely (i) Part-of-Speech (POS) tagger in disambiguating word senses and (ii) Enhanced with Clustering and Sense-collocation dictionary based disambiguation. In the first step an ambiguous Tamil words are disambiguated using Tamil and English POS Tagger. If it has same type of POS category labels, then it passes the word to the next step. In the second step ambiguity is resolved using sense-collocation dictionary. The experimental analysis shows that the accuracy of proposed WSDPCS method achieves 1.86% improvement over an existing method.