Cluster analysis to identify prominent patterns of anti-hypertensives: A three-tiered unsupervised learning approach

oleh: Reetu Sharma

Format: Article
Diterbitkan: Elsevier 2020-01-01

Deskripsi

Clustering brings molecules having similar patterns together and is governed mainly by the structural features (SFs). The challenge is to cluster in such a way that the minimum number of groups with significant molecules having similar prevalent patterns comes together with minimal human intervention. Determining an automatic and reliable approach to cluster molecules is crucial for clinical assessment of medical conditions. Hypertension is one of such health conditions and anti-hypertensives (AHs) are the approved drugs to treat it. Here, an attempt has been made to cluster the AHs to identify the prominent patterns within a group. Principal component analysis (PCA) and k-means are well established independent algorithms, however, in this work, clustering is proceeded by PCA and is followed by one-way analysis of similarities (ANOSIM). The additional step of statistical relevance brings novelty in the final selection of the cluster. The latter highlights the significant difference between the two or more groups to enhance the clustering based on the similarity of the features within a group. Clustering of the United States Food and Drug Administration agency approved anti-hypertensives into six groups show a success rate of 94.73%. Kruskal-Wallis test for k = 6 suggest that there is a significant difference between the sample medians. Analysis of the cluster identifies the prominent pattern within a group. The average specificity and sensitivity achieved for k = 6 are 98.0 ± 1.6% and 97.3 ± 4.5%, respectively. A brief overview of the structural or functional relevance of molecules overlapping in a PCA plot for k = 6 has been discussed. The study is likely to be useful for a preliminary assessment to identify prevalent trends in SFs of AHs.