A Novel Hot Topic Detection Framework With Integration of Image and Short Text Information From Twitter

oleh: Chengde Zhang, Shaozhen Lu, Chengming Zhang, Xia Xiao, Qian Wang, Gao Chen

Format: Article
Diterbitkan: IEEE 2019-01-01

Deskripsi

Twitter exhibits several characteristics, including a limited number of features and noisy text information. Extracting valuable information from Twitter has made hot topic detection a challenging task. In this paper, a novel four-stage framework is proposed to improve the performance of topic detection. Data preprocessing is the first stage. Deep learning is then exploited to enrich short text information via image understanding. Next, improved latent Dirichlet allocation is used to optimize the image effective word pairs, which improves the accuracy of the extracted topic words. Finally, both short text and images are integrated for topic detection, in which the corresponding topics are mined based on fuzzy matching of topic words. A large number of experiments show that the proposed framework significantly improves the performance of topic detection and outperforms the selected baseline methods.