Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
BaNeP: An End-to-End Neural Network Based Model for Bangla Parts-of-Speech Tagging
oleh: Jesan Ahammed Ovi, Md. Ashraful Islam, Md. Rezaul Karim
Format: | Article |
---|---|
Diterbitkan: | IEEE 2022-01-01 |
Deskripsi
In Natural Language Processing, Parts-of-Speech tagging is a vital component that significantly impacts applications like machine translation, spell-checker, information retrieval, and speech processing. In languages such as English and Dutch, POS tagging is considered a solved problem (accuracy: 97%). However, for low-resource languages like Bangla, challenges are still there. In this article, we have proposed a novel RNN-based network named BaNeP to determine parts of speech for Bangla words. The proposed network extracts structural features through a bidirectional LSTM-based sub-network, and intricate contextual relations among words of a sentence are identified through an elaborate weighted context extraction procedure. These features are then combinedly utilized to generate the final Parts-of-Speech prediction. Training the model requires only an annotated dataset vanishing the need for any hand-crafted features. Experimental results on the LDC2010T16 dataset show significant accuracy improvement compared to existing Bangla POS taggers.