Sirvi Autor "Marandi, Markus" järgi
Nüüd näidatakse 1 - 1 1
- Tulemused lehekülje kohta
- Sorteerimisvalikud
listelement.badge.dso-type Kirje , Machine Learning Framework for Classification of Potential Hereditary Cancers(Tartu Ülikool, 2024) Marandi, Markus; Pata, Villem; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. TehnoloogiainstituutThis thesis investigates a machine learning model that classifies potential pathological genetic variants from targeted hereditary data. Due to the vast amounts of data generated in clinical genetics, rapid and precise screening is essential for diagnostics, which can be facilitated by machine learning. The study utilises a dataset from Tartu University Hospital containing genetic variants of 7,498 individuals, including 2,449 investigated due to breast cancer. All genetic variants were reannotated using the Variant Effect Predictor (VEP) database version 111 with allele frequency and pathogenicity scores. For training the XGBoost-based model, fields such as ’IMPACT’ (predicted impact of a genetic variant), ’QUAL’ (quality score of the variant call), ’DP’ (read depth at the position), ’QD’ (quality score normalised by depth), and ’MAX AF’ (maximum allele frequency in populations) were chosen, focusing on those critical for clinical evaluation practice. The study highlights a significant bottleneck in researching rare diseases, characterised by a scarcity of pathogenic genetic variants (signal) compared to common genetic variants (noise). Although the model achieved a moderate overall accuracy of 0.999, it exhibited a high precision of 0.834 but a low sensitivity of 0.401 due to the low signal-to-noise ratio. The practical output of the model is its utility in automatically filtering out negative cases and highlighting potential positive variants for further analysis. The precision-recall curve provides a more objective depiction of the model’s performance than the ROC due to the low signal. While the model significantly reduced the number of rows required for clinical consultation by 99.96%, its ability to detect true positive cases was limited. The anonymous genetic variant dataset created during this research is an independent study object, enabling the improvement of diagnostics with machine learning models. Future enhancements to this model may include integrating clinical data, additional pathogenicity scores, or linking with other databases.