Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Random Projection for fast and efficient multivariate correlation analysis of high-dimensional data: A new approach
oleh: Claudia eGrellmann, Claudia eGrellmann, Jane eNeumann, Jane eNeumann, Jane eNeumann, Sebastian eBitzer, Sebastian eBitzer, Peter eKovacs, Anke eTönjes, Lars Tjelta Westlye, Lars Tjelta Westlye, Ole Andreas Andreassen, Michael eStumvoll, Michael eStumvoll, Arno eVillringer, Arno eVillringer, Arno eVillringer, Arno eVillringer, Annette eHorstmann, Annette eHorstmann, Annette eHorstmann
| Format: | Article |
|---|---|
| Diterbitkan: | Frontiers Media S.A. 2016-06-01 |
Deskripsi
In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, computationally extensive in applications involving large numbers of variables, as required, for example, in functional genomics. To handle high-dimensional problems, dimension reduction might be implemented as pre-processing step. We propose a new approach that incorporates Random Projection (RP) for dimensionality reduction into Partial Least Squares Correlation to efficiently solve high-dimensional multimodal problems like genotype-phenotype associations.We name our new method PLSC-RP. Using simulated and experimental data sets containing whole genome SNP measures as genotypes and whole brain neuroimaging measures as phenotypes, we demonstrate that PLSC-RP is drastically faster than traditional PLSC while providing statistically equivalent results. We also provide evidence that dimensionality reduction using RP is data type independent. Therefore, PLSC-RP opens up a wide range of possible applications. It can be used for any integrative analysis that combines information from multiple sources.