What is a successful antibiotic resistance gene? A conceptual model and machine learning predictions
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Biotechnology (MPBIO), MSc
Publicerad
2024
Författare
Einarsson, Elinor
Torell, Stina
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Antibiotic resistance is a global public health threat and it causes bacterial infections
to become more difficult to treat. The spread of antibiotic resistance genes (ARGs)
is predominantly driven by horizontal gene transfer (HGT) that enables bacteria to
share genetic information directly between cells. The ability of an ARG to spread is
influenced by a range of factors, and has become a popular field of research, aiming
to find characteristics that enable rapid antibiotic resistance dissemination. This
facilitates the identification of ARGs that possess the ability to disseminate rapidly,
and for proactive measures against the dissemination to be implemented.
Bioinformatics tools were used to study the prevalence of 4775 known ARGs in 867
318 bacterial genomes. A conceptual model describing the success of an ARG was
developed containing four different measures of dissemination, over taxonomic barriers,
in different GC-environments, geographical dissemination, and dissemination
to pathogenic bacteria. By using a top-down approach studying the success of a
gene, the thesis complements research studying factors that characterizes successful
and rapid HGT. The conceptual model resulted in a success-score for each ARG
that reflected the overall performance in the four components. Among the ARGs
found to be highly successful the most common class was multidrug resistance, followed
by aminoglycoside, β-lactam, and MLS antibiotic resistance. Furthermore,
the success-score together with information about the genes, were used to investigate
the possibility to predict the success of an ARG with the use of machine learning
in a binary classification Random forest algorithm. The model was built to evaluate
the predictive performance using decreasing amounts of observations of each gene.
As expected, the predictive performance of the model improved as the number of
observation increased. Based on only one observation, it was possible to predict the
class of each gene with an average sensitivity of ~70% at 90% specificity, and with
250 observations a sensitivity of 98% could be attained. Sequence related features
such as gene length and codon usage were important when only a few observations
of a gene were used, but as the number of observations grew, non-sequence related
features such as number of countries and pathogens a gene was found in, became
more relevant. A meta-analysis also aims to explore the managerial and policy implications
of antibiotics resistance, and findings include that policies facilitating for
machine learning are important to implement. This study can be used as a starting
point in the modelling of antibiotic resistance gene success, aiming to help identify
emerging ARGs that have the possibility to become future threats.
Beskrivning
Ämne/nyckelord
Antibiotic Resistance, Bioinformatics, Horizontal Gene Transfer, Successful ARGs, Machine Learning, Random Forest, Managerial implications