Causal Models Applied to Studies within the Mining Software Repository Domain

dc.contributor.authorLEVINSSON, AMANDA
dc.contributor.authorFRANSSON, LINNÉA
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerFeldt, Robert
dc.contributor.supervisorTorkar, Richard
dc.date.accessioned2025-02-11T15:03:00Z
dc.date.available2025-02-11T15:03:00Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractContext: Research conducted in the mining software repository domain commonly utilize observational data, due to software repositories serving as a rich source of such data. Simultaneously, there is a clear lack regarding the incorporation of causality in Software Engineering (SE) research, whilst statistical analyses often are conducted. Objective: To analyse the practical implications of applying causal models to studies from the Mining Software Repository (MSR) conference. Specifically, it is of interest to examine whether researchers accidentally have included variables (colliders) in their analyses which have biased their results. Method: A computer simulation was utilized as research methodology. This included the steps of (1) identifying a paper with colliders by sampling from the MSR conference and constructing Directed Acylic Graphs (DAGs), (2) a theoretical computer simulation of an SE scenario to prove collider effects, (3) computer simulations utilizing generated synthetic data based on the identified research paper. In addition, an analysis was conducted using the original data from chosen paper. Results: A lack of transparency amongst the research investigated was identified, where variable selection processes and underlying assumptions were not completely clear. Three papers were investigated in the first step of constructing DAGs. Subsequently, colliders were identified in the paper of Nagy and Abdalkareem [46]. Simulations revealed that the exclusion of collider variables improved the sought after effect sizes. However, no practical implications were possible to determine. Replication package available 1. Conclusion: A lack of transparency hindered the construction of DAGs, and indicated a threat to advancements in research. This, due to the need of interpreting authors’ assumptions in their research. An incorporation of causality and DAGs could, due to the increased transparency it would bring, in the long run result in more robust advancements in research. Additionally, DAGs are recommended as tools to mitigate the risk of accidentally conditioning on colliders.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/309123
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectEmpirical Software Engineering
dc.subjectColliders
dc.subjectDirected Acyclic Graphs
dc.subjectDAGs
dc.subjectMining Software Repository Research
dc.subjectCausal Inference
dc.subjectBayesian Statistics
dc.titleCausal Models Applied to Studies within the Mining Software Repository Domain
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeSoftware engineering and technology (MPSOF), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 24-105 AL LF.pdf
Storlek:
2.88 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: