Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
DNA Methylation Prediction Using Reduced Features Obtained via Gappy Pair Kernel and Partial Least Square
oleh: Sajid Shah, Altaf Ur Rahman, Saima Jabeen, Ahmad Khan, Fiaz Gul Khan, Mohammed Elaffendi
Format: | Article |
---|---|
Diterbitkan: | IEEE 2022-01-01 |
Deskripsi
It is critical to correctly identify DNA methylation because it has been linked to a variety of human disorders, particularly cancer. DNA methylation is an epigenetic process that allows cells to alter gene expression. This work deals with a type of DNA methylation called 5-methyl cytosine (m5c), in which the methyl group (<inline-formula> <tex-math notation="LaTeX">$CH_{3}$ </tex-math></inline-formula>) is attached to the <inline-formula> <tex-math notation="LaTeX">$5^{th}$ </tex-math></inline-formula> carbon of cytosine. The performances of different machine learning algorithms used for methylation identification are greatly degraded due to poor representation of input sequential data. In the current work, we have proposed a classification model that is based on the extraction of high differentiating features from the sample sequences using gappy pair kernel. Increasing the number of features to better represent a sequence leads to the curse of dimensionality, which is handled by a dimensionality reduction technique called PLS (Partial Least Square). The obtained features are then subjected to multiple classifiers to test the discriminating power of these features. Results are computed for cross species i.e human and mouse, to check the robustness of our proposed model. Finally, the obtained results are compared in terms of sensitivity, specificity, and accuracy with the state-of-the-art approaches. Our proposed approach has outperformed state-of-the-art techniques in all three metrics for both datasets. For research community to test our technique, we have uploaded our code on github (<uri>https://github.com/sajidshahbs/gappypairKernel_Rcode</uri>).