Accuracy and Diversity in Ensembles of Text Categorisers

oleh: Juan Jose Garcıa Adeva, Ulises Cervino Beresi, Rafael A. Calvo

Format: Article
Diterbitkan: Centro Latinoamericano de Estudios en Informática 2005-12-01

Deskripsi

Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Cate- gorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition ma- trix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity between two dichotomies and a way of combining all the pairwise diversity values into a single indicator that we call the decomposition quality. In this work we introduce a new measure to estimate the diversity between two learners and we compare it to the well-known Hamming distance. We also examine three functions to evaluate the decomposition quality. We present a set of experiments where these measures and functions are tested using two distinct document corpora with several configurations in each. The analysis of the results shows a weak relationship between the ensemble accuracy and its diversity.