Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Improved sqrt-cosine similarity measurement
oleh: Sahar Sohangir, Dingding Wang
Format: | Article |
---|---|
Diterbitkan: | SpringerOpen 2017-07-01 |
Deskripsi
Abstract Text similarity measurement aims to find the commonality existing among text documents, which is fundamental to most information extraction, information retrieval, and text mining problems. Cosine similarity based on Euclidean distance is currently one of the most widely used similarity measurements. However, Euclidean distance is generally not an effective metric for dealing with probabilities, which are often used in text analytics. In this paper, we propose a new similarity measure based on sqrt-cosine similarity. We apply the proposed improved sqrt-cosine similarity to a variety of document-understanding tasks, such as text classification, clustering, and query search. Comprehensive experiments are then conducted to evaluate our new similarity measurement in comparison to existing methods. These experimental results show that our proposed method is indeed effective.