Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
oleh: Labouriau Rodrigo, de Abreu Gabriel CG, Edwards David
| Format: | Article |
|---|---|
| Diterbitkan: | BMC 2010-01-01 |
Deskripsi
<p>Abstract</p> <p>Background</p> <p>Chow and Liu showed that the maximum likelihood tree for multivariate discrete distributions may be found using a maximum weight spanning tree algorithm, for example Kruskal's algorithm. The efficiency of the algorithm makes it tractable for high-dimensional problems.</p> <p>Results</p> <p>We extend Chow and Liu's approach in two ways: first, to find the forest optimizing a penalized likelihood criterion, for example AIC or BIC, and second, to handle data with both discrete and Gaussian variables. We apply the approach to three datasets: two from gene expression studies and the third from a genetics of gene expression study. The minimal BIC forest supplements a conventional analysis of differential expression by providing a tentative network for the differentially expressed genes. In the genetics of gene expression context the method identifies a network approximating the joint distribution of the DNA markers and the gene expression levels.</p> <p>Conclusions</p> <p>The approach is generally useful as a preliminary step towards understanding the overall dependence structure of high-dimensional discrete and/or continuous data. Trees and forests are unrealistically simple models for biological systems, but can provide useful insights. Uses include the following: identification of distinct connected components, which can be analysed separately (dimension reduction); identification of neighbourhoods for more detailed analyses; as initial models for search algorithms with a larger search space, for example decomposable models or Bayesian networks; and identification of interesting features, such as hub nodes.</p>