Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Semi-supervised Self-Training Algorithm for Density Peak Membership Optimization
oleh: LIU Xuewen, WANG Jikui, YANG Zhengguo, LI Bing, NIE Feiping
Format: | Article |
---|---|
Diterbitkan: | Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2022-09-01 |
Deskripsi
Most of data contain only a few labels because of high cost of obtaining them in reality. Compared with supervised learning and unsupervised learning, semi-supervised learning can obtain higher learning performance with less labeling cost by making full use of large amount of unlabeled data and small amount of labeled data in datasets.Self-Training algorithm is a classical semi-supervised learning algorithm. In the process of iteratively optimizing classi-fier, high-confidence samples are continuously selected from unlabeled samples and labeled by the base classifier.Then, these samples and pseudo-labels will be added into the training sets. Selecting high-confidence samples is a critical step in the Self-Training algorithm. Inspired by the density peaks clustering (DPC) algorithm, this paper pro-poses semi-supervised Self-Training algorithm for density peak membership optimization (STDPM), which uses den-sity peak to select high-confidence samples. Firstly, STDPM takes density peak to discover the potential spatial structure information of the samples and constructs a prototype tree. Secondly, STDPM searches the unlabeled direct relatives of the labeled samples in the prototype tree, and defines the density peak of the unlabeled direct relatives that belong to different clusters as the clusters-peak. Then, clusters-peak is turned into the density peak membership after normalized. Finally, STDPM regards samples with membership greater than the set threshold as high-confidence samples that are labeled by the base classifier and added to the training set. STDPM makes full use of the density and distance information implied by the peak, which improves the selection quality of high-confi-dence samples and further improves the classification performance. Comparative experiments are conducted on 8 bench-mark datasets, which verify the effectiveness of STDPM.