Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Finding the Number of Clusters Using a Small Training Sequence
oleh: Dong Sik Kim
Format: | Article |
---|---|
Diterbitkan: | IEEE 2023-01-01 |
Deskripsi
In clustering the training sequence (TS), K-means algorithm tries to find empirically optimal representative vectors that achieve the empirical minimum to inductively design optimal representative vectors yielding the true optimum for the underlying distribution. In this paper, the convergence rates on the clustering errors are first observed as functions of <inline-formula> <tex-math notation="LaTeX">$\beta ^{-\alpha }$ </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula> is the training ratio that relates the training sequence size to the number of representative vectors, and <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> is a non-negative constant. From the convergence rates, we can observe the training performance for a finite TS size. If the TS size is relatively small, errors occur in finding the number of clusters. In order to reduce the errors from small TS sizes, a compensation constant <inline-formula> <tex-math notation="LaTeX">$(1-\beta ^{-\alpha })^{-1}$ </tex-math></inline-formula> for the empirical errors is devised based on the rate analyses and a novel algorithm for finding the number of clusters is proposed. The compensation constant can be applied to other clustering applications especially when the TS size is relatively small.