Semantic N-Gram Topic Modeling

oleh: Pooja Kherwa, Poonam Bansal

Format: Article
Diterbitkan: European Alliance for Innovation (EAI) 2020-05-01

Deskripsi

In this paper a novel approach for effective topic modeling is presented. The approach is different fromtraditional vector space model-based topic modeling, where the Bag of Words (BOW) approach is followed.The novelty of our approach is that in phrase-based vector space, where critical measure like point wisemutual information (PMI) and log frequency based mutual dependency (LGMD)is applied and phrase’ssuitability for particular topic are calculated and best considerable semantic N-Gram phrases and terms areconsidered for further topic modeling. In this experiment the proposed semantic N-Gram topic modeling iscompared with collocation Latent Dirichlet allocation(coll-LDA) and most appropriate state of the art topicmodeling technique latent Dirichlet allocation (LDA). Results are evaluated and it was found that perplexity isdrastically improved and found significant improvement in coherence score specifically for short text data setlike movie reviews and political blogs.