Naïve Bayes evidence accumulation K-modes clustering: A new method for classifying binary data and its application on real data of injecting drug users

oleh: Zahra Zamaninasab, Hamid Sharifi, Abbas Bahrampour

Format:	Article
Diterbitkan:	Tehran University of Medical Sciences 2018-10-01

Deskripsi

Background & Aim: Clustering is the method of classifying discrete data such as Kmodes, and Naïve Bayes classifier is the classification to predict the unknown real classes. In this research, we improve the K-modes results by applying the Evidence Accumulation (EA) method to keep the initial mode vector to use in the Naïve Bayes EA K-Mode. Methods & Materials: The methods are applied to four real datasets, which the true classes are specified, for checking the external validity and purity of our methods. The free programming software R with package klaR for K-modes, EA, and package e1071 for Naïve Bayes is used. In addition, the methods are applied to the data of Injecting Drug Users (IDU) national dataset with sample size 2546. Results: The EA K-modes algorithm applied to five real datasets then with the kept initial mode vector, rerun the K-modes. The results indicate the purity in the EA K-modes (0.544, 0.862, 0.914, 0.944, 0.625) has significant different with classic K-modes (0.497, 0.610, 0.404, 0.650, 0.625). Finally, we applied the Naïve Bayes classifier with prior probability finds in EA K-modes. For K=2 Naïve Bayes EA K-modes made better clustering (0.71, 0.873 against 0.625, 0.862 EA k-mode and 0.497, 0.61 K-mode). Conclusion: In this paper, we proposed Naïve Bayes EA K-modes as a new method for clustering of binary data. Our new method leads to stable clustering compare with the previous studies. The Naïve Bayes EA K-modes method improves the purity and establishes a better separation.

Find in Library

Indexed Open Access Databases

Naïve Bayes evidence accumulation K-modes clustering: A new method for classifying binary data and its application on real data of injecting drug users

Deskripsi