Source Attribution of Human Campylobacteriosis Using Whole-Genome Sequencing Data and Network Analysis

oleh: Lynda Wainaina, Alessandra Merlotti, Daniel Remondini, Clementine Henri, Tine Hald, Patrick Murigu Kamau Njage

Format: Article
Diterbitkan: MDPI AG 2022-06-01

Deskripsi

<i>Campylobacter</i> spp. are a leading and increasing cause of gastrointestinal infections worldwide. Source attribution, which apportions human infection cases to different animal species and food reservoirs, has been instrumental in control- and evidence-based intervention efforts. The rapid increase in whole-genome sequencing data provides an opportunity for higher-resolution source attribution models. Important challenges, including the high dimension and complex structure of WGS data, have inspired concerted research efforts to develop new models. We propose network analysis models as an accurate, high-resolution source attribution approach for the sources of human campylobacteriosis. A weighted network analysis approach was used in this study for source attribution comparing different WGS data inputs. The compared model inputs consisted of cgMLST and wgMLST distance matrices from 717 human and 717 animal isolates from cattle, chickens, dogs, ducks, pigs and turkeys. SNP distance matrices from 720 human and 720 animal isolates were also used. The data were collected from 2015 to 2017 in Denmark, with the animal sources consisting of domestic and imports from 7 European countries. Clusters consisted of network nodes representing respective genomes and links representing distances between genomes. Based on the results, animal sources were the main driving factor for cluster formation, followed by type of species and sampling year. The coherence source clustering (CSC) values based on animal sources were <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>78</mn><mo>%</mo></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>81</mn><mo>%</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>78</mn><mo>%</mo></mrow></semantics></math></inline-formula> for cgMLST, wgMLST and SNP, respectively. The CSC values based on <i>Campylobacter</i> species were <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>78</mn><mo>%</mo></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>79</mn><mo>%</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>69</mn><mo>%</mo></mrow></semantics></math></inline-formula> for cgMLST, wgMLST and SNP, respectively. Including human isolates in the network resulted in <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>88</mn><mo>%</mo></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>77</mn><mo>%</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>88</mn><mo>%</mo></mrow></semantics></math></inline-formula> of the total human isolates being clustered with the different animal sources for cgMLST, wgMLST and SNP, respectively. Between <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>12</mn><mo>%</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>23</mn><mo>%</mo></mrow></semantics></math></inline-formula> of human isolates were not attributed to any animal source. Most of the human genomes were attributed to chickens from Denmark, with an average attribution percentage of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>52.8</mn><mo>%</mo></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>52.2</mn><mo>%</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>51.2</mn><mo>%</mo></mrow></semantics></math></inline-formula> for cgMLST, wgMLST and SNP distance matrices respectively, while ducks from Denmark showed the least attribution of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0</mn><mo>%</mo></mrow></semantics></math></inline-formula> for all three distance matrices. The best-performing model was the one using wgMLST distance matrix as input data, which had a CSC value of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>81</mn><mo>%</mo></mrow></semantics></math></inline-formula>. Results from our study show that the weighted network-based approach for source attribution is reliable and can be used as an alternative method for source attribution considering the high performance of the model. The model is also robust across the different <i>Campylobacter</i> species, animal sources and WGS data types used as input.