Compatibility patterns in antibiotic resistant genes in pathogens
Typ
Examensarbete för masterexamen
Program
Biotechnology (MPBIO), MSc
Publicerad
2020
Författare
Lindbom, Agnes
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Antibiotic resistance is increasing globally and is a substantial threat to the public health
with higher costs and longer hospital stays. Infections of antibiotic resistant bacteria are
harder to treat and without actions, there is a risk common infections eventually can become
life-threatening again. Antibiotic resistance can be acquired by mutations in existing
genes or from resistance genes transferred from other bacteria by horizontal gene transfer.
The reason why some genes are transferred could be explained by the compatibility of the
gene, but currently, there is not so much information what would be needed to make a
gene compatible and able to be transferred and make new antibiotic resistant bacteria.
The aim of this project is to investigate whether there are compatibility patterns in antibiotic
resistant genes in the pathogens Escherichia coli and Klebsiella pneumoniae. This was
done by three analyses, kmer analysis, frequency analysis and analysis of regions surrounding
the gene. The project also included predictive models to investigate the predictive
ability of a logistic regression model. The data was collected from NCBI and ResFinder
and core genomes from the species were used. The antibiotic resistant genes were divided
into groups whether they were present or not in the species after using BLAST.
The kmer analysis used the kmer distributions of different kmer lengths in three methods;
squared Euclidean distance, absolute maximum values and maximum value and gave
similar results for both species. For smaller kmer lengths, differences could be seen for
the species in the median and p-values between the antibiotic resistant genes and the
core genome genes. For increasing kmer lengths less differences could be seen. In the
frequency analysis the genes were merged into genes groups and the values from kmer
analysis were compared against the number of hits from the BLAST results. Many values
clustered around the median values and the gene groups with the most hits were close to
the median values while the values far away from the median values did not have many
hits. No clear conclusion about different antibiotic classes could be seen. In the analysis
of regions surrounding the gene, sequences of 100 bp upstream and downstream of each
gene were cut and the genes were divided into groups whether they were present in both
or one species. There could be seen differences between the unique groups of the species,
while the difference between the shared groups were surprisingly low. On kmer level, the
kmers that differed most between the species had no clear correlation, potentially they
could be related to the higher GC content in K. pneumoniae. Predictive models were
created with logistic regression for all genes and three different antibiotic classes. The
model for all genes included the length and 21 kmers out of 64 possible kmers. The model
performed better than a random classification with the best values of true positive rate of
72% and false positive rate of 11%. The other models included fewer kmers and most of
the classifier performed better than a random classification.
In this project analyses have developed and from the results it has been found compatibility
patterns of the antibiotic resistant genes by looking at the gene sequences and the regions
surrounding the genes. From the gene sequences it has also been possible to predict the
gene compatibility in a predictive model.
Beskrivning
Ämne/nyckelord
antibiotic resistance, pathogens, horizontal gene transfer, gene compatibility, predictive model, logistic regression, bioinformatics