SU-QMI: A Feature Selection Method Based on Graph Theory for Prediction of Antimicrobial Resistance in Gram-Negative Bacteria

oleh: Abu Sayed Chowdhury, Douglas R. Call, Shira L. Broschat

Format: Article
Diterbitkan: MDPI AG 2020-12-01

Deskripsi

Machine learning can be used as an alternative to similarity algorithms such as BLASTp when the latter fail to identify dissimilar antimicrobial-resistance genes (ARGs) in bacteria; however, determining the most informative characteristics, known as features, for antimicrobial resistance (AMR) is essential to obtain accurate predictions. In this paper, we introduce a feature selection algorithm called symmetrical uncertainty qualitative mutual information (SU-QMI), which selects features based on estimates of their relevance, redundancy, and interdependency. We use these together with graph theory to derive a feature selection method for identifying putative ARGs in Gram-negative bacteria. We extract physicochemical, evolutionary, and structural features from the protein sequences of five genera of Gram-negative bacteria—<i>Acinetobacter</i>, <i>Klebsiella</i>, <i>Campylobacter</i>, <i>Salmonella</i>, and <i>Escherichia</i>—which confer resistance to acetyltransferase (<i>aac</i>), <i>β</i>-lactamase (<i>bla</i>), and dihydrofolate reductase (<i>dfr</i>). Our SU-QMI algorithm is then used to find the best subset of features, and a support vector machine (SVM) model is trained for AMR prediction using this feature subset. We evaluate performance using an independent set of protein sequences from three Gram-negative bacterial genera—<i>Pseudomonas</i>, <i>Vibrio</i>, and <i>Enterobacter</i>—and achieve prediction accuracy ranging from 88 to 100%. Compared to the SU-QMI method, BLASTp requires similarity as low as 53% for comparable classification results. Our results indicate the effectiveness of the SU-QMI method for selecting the best protein features for AMR prediction in Gram-negative bacteria.