CMA – a comprehensive Bioconductor package for supervised classification with high dimensional data

oleh: Boulesteix A-L, Daumer M, Slawski M

Format: Article
Diterbitkan: BMC 2008-10-01

Deskripsi

<p>Abstract</p> <p>Background</p> <p>For the last eight years, microarray-based classification has been a major topic in statistics, bioinformatics and biomedicine research. Traditional methods often yield unsatisfactory results or may even be inapplicable in the so-called "<it>p </it>≫ <it>n</it>" setting where the number of predictors <it>p </it>by far exceeds the number of observations <it>n</it>, hence the term "ill-posed-problem". Careful model selection and evaluation satisfying accepted good-practice standards is a very complex task for statisticians without experience in this area or for scientists with limited statistical background. The multiplicity of available methods for class prediction based on high-dimensional data is an additional practical challenge for inexperienced researchers.</p> <p>Results</p> <p>In this article, we introduce a new Bioconductor package called CMA (standing for "<b>C</b>lassification for <b>M</b>icro<b>A</b>rrays") for automatically performing variable selection, parameter tuning, classifier construction, and unbiased evaluation of the constructed classifiers using a large number of usual methods. Without much time and effort, users are provided with an overview of the unbiased accuracy of most top-performing classifiers. Furthermore, the standardized evaluation framework underlying CMA can also be beneficial in statistical research for comparison purposes, for instance if a new classifier has to be compared to existing approaches.</p> <p>Conclusion</p> <p>CMA is a user-friendly comprehensive package for classifier construction and evaluation implementing most usual approaches. It is freely available from the Bioconductor website at <url>http://bioconductor.org/packages/2.3/bioc/html/CMA.html</url>.</p>