AI-guided detection of antibiotic-resistantbacteria using resistance genes
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Biomedical engineering (MPBME), MSc
Publicerad
2024
Författare
Aerts, Erik
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Antibiotic resistance is threatening advancements made in modern medicine. Understanding
the genomics behind multi-resistant profiles can assist in planning the
correct treatment which can lower the abundance of antibiotic usage and hamper the
vicious resistance cycle. Transformer-based AI models have shown state-of-the-art
performance in understanding complex patterns in data. The thesis aimed to create
a framework on how to implement transformers to predict bacterial resistance profiles
by training on genomic data. The framework consisted of a transformer-based
encoder and parallel classification networks for predicting antibiotic susceptibility.
Each model trained on antibiotic resistance genes (ARGs) from Escherichia coli
where a subset of isolates had recorded resistance profiles.
The results showed that having a high complexity in the encoder is key for the model
to accurately predict resistance to antibiotics where the occurrence of resistance is
rare. This is relevant for any clinical setting, as models with less than 12 encoder
blocks could not find these resistance profiles. The framework benefited from pretraining
on unlabeled genomic data as performance generally increased. However,
the type of masked language model pre-training which benefited the system more
was situational and no conclusion was drawn. Finally, the thesis also found features
in the data on which the models were basing decisions off on. The number of ARGs
of an isolate was deemed the most influential feature in the data which relates to how
much information the transformer can process. Following, relations between ARGs
gyrA-D87N / parC-S80I and aph(3”)-Ib / aph(6’)-Id were shown to be an important
decision basis for the models. Likewise, two point mutations of the pmrB gene also
stood out as important ARGs in the decision-making processes for the models. The
reasons why these ARGs are weighted highly by the models are currently unknown
but are of interest to be studied further for a better understanding of underlying
factors to multi-resistance.
Beskrivning
Ämne/nyckelord
Artificial intelligence, antibiotic resistance, transformer, self-attention, embedding, encoder, pre-training, fine-tuning, masked language modelling.