Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Discriminative Feature Spamming Technique for Roman Urdu Sentiment Analysis
oleh: Khawar Mehmood, Daryl Essam, Kamran Shafi, Muhammad Kamran Malik
Format: | Article |
---|---|
Diterbitkan: | IEEE 2019-01-01 |
Deskripsi
Term weighting is one of the most commonly used approaches, which works by assigning weights to terms, that aims to improve the performance of information retrieval or text categorization tasks. In this paper, we present a novel term weighting technique, called discriminative feature spamming technique (DFST), which identifies distinctive terms, based on a term utility criteria (TUC), and then spams them to increase their discriminative power. The experimental results show that the DFST outperformed a set of time-tested term weighting schemes, from the information retrieval field. All the experiments were performed on the largest ever Roman Urdu (RU) dataset of 11000 reviews, which was collected and annotated for this work. In addition, a custom tokenizer was built, which further improved classification accuracy. A cross-scheme comparison was performed, which showed that the results obtained by using the newly proposed DFST, were statistically significant and better than previous approaches.