Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Robust Model Design by Comparative Evaluation of Clustering Algorithms
oleh: Xiaopeng Chen, Chanseok Park, Xuehong Gao, Bosung Kim
Format: | Article |
---|---|
Diterbitkan: | IEEE 2023-01-01 |
Deskripsi
The K-means algorithm, widely used in cluster analysis, is a centroid-based clustering method known for its high efficiency and scalability. However, in realistic situations, the operating environment is susceptible to contamination issues caused by outliers and distribution departures, which may lead to clustering results from K-means that are distorted or rendered invalid. In this paper, we introduce three other alternative algorithms, including K-weighted-medians, K-weighted-L2-medians, and K-weighted-HLs, to address these issues under the consideration of data with weights. The impact of contamination is investigated by examining the estimation effects on optimal cluster centroids. We explore the robustness of the clustering algorithms from the perspective of the breakdown point, and then conduct experiments on simulated and real datasets to evaluate their performance using two new numerical metrics: relative efficiencies based on generalized variance and average Euclidean distance. The results demonstrate the effectiveness of the proposed K-weighted-HLs algorithm, surpassing other algorithms in scenarios involving both contamination issues.