Data classification algorithm for data-intensive computing environments

oleh: Tiedong Chen, Shifeng Liu, Daqing Gong, Honghu Gao

Format: Article
Diterbitkan: SpringerOpen 2017-12-01

Deskripsi

Abstract Data-intensive computing has received substantial attention since the arrival of the big data era. Research on data mining in data-intensive computing environments is still in the initial stage. In this paper, a decision tree classification algorithm called MR-DIDC is proposed that is based on the programming framework of MapReduce and the SPRINT algorithm. MR-DIDC inherits the advantages of MapReduce, which make the algorithm more suitable for data-intensive computing applications. The performance of the algorithm is evaluated based on an example. The results of experiments showed that MR-DIDC can shorten the operation time and improve the accuracy in a big data environment.