Finding the Number of Clusters Using a Small Training Sequence

oleh: Dong Sik Kim

Format:	Article
Diterbitkan:	IEEE 2023-01-01

Deskripsi

In clustering the training sequence (TS), K-means algorithm tries to find empirically optimal representative vectors that achieve the empirical minimum to inductively design optimal representative vectors yielding the true optimum for the underlying distribution. In this paper, the convergence rates on the clustering errors are first observed as functions of <inline-formula> <tex-math notation="LaTeX">$\beta ^{-\alpha }$ </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula> is the training ratio that relates the training sequence size to the number of representative vectors, and <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> is a non-negative constant. From the convergence rates, we can observe the training performance for a finite TS size. If the TS size is relatively small, errors occur in finding the number of clusters. In order to reduce the errors from small TS sizes, a compensation constant <inline-formula> <tex-math notation="LaTeX">$(1-\beta ^{-\alpha })^{-1}$ </tex-math></inline-formula> for the empirical errors is devised based on the rate analyses and a novel algorithm for finding the number of clusters is proposed. The compensation constant can be applied to other clustering applications especially when the TS size is relatively small.

Find in Library

Indexed Open Access Databases

Finding the Number of Clusters Using a Small Training Sequence

Deskripsi