Human genome studies with k-mer frequencies
Laen...
Kuupäev
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu Ülikooli Kirjastus
Abstrakt
Inimese genoom on keeruline ja pidevas muutumises – seal toimuvad mutatsioonid kogu aeg. Kuigi genoomi uurimine oli veel 25 aastat tagasi aeglane ja kallis, on tehnoloogia areng toonud kaasa suure läbimurde. Varem kasutati peamiselt DNA mikrokiipe, mis tuvastasid üheaegselt üksikuid muutusi ehk SNP-sid. Tänapäeval saab järjestada terve genoomi ja analüüsida miljardeid andmepunkte korraga.
Selles töös kasutati uudset lähenemist, mis põhineb nn k-meride analüüsil. K-merid on lühikesed DNA lõigud (25 tähte), mille esinemissagedust saab arvutada ilma kõigi ühe inimese järjestusi eelnevalt ajakulukalt standardiga võrdlemata. See teeb andmetöötluse kiiremaks ja võimaldab tuvastada ka selliseid muutusi, mida varasemad meetodid ei näinud – eriti korduvates või tehniliselt keerulistes piirkondades.
Töö üks olulisemaid uuendusi on Y-kromosoomi põlvnevusgruppide määramine väga väikese DNA koguse põhjal. Kui tavaliselt vajatakse usaldusväärseks analüüsiks 20-kordset kordust üle andmete, siis siin kasutati vähem kui 1% juhuslikku osa genoomist. See oli võimalik tänu korduvatele DNA järjestustele Y-kromosoomis, mida varem peeti analüüsimiseks liiga keeruliseks.
Uuringus käsitletud meetod kasutab neid kordusi omamoodi loodusliku "võimendusena", nagu DNA paljundamine laboris. Aja jooksul on need piirkonnad kogunud unikaalseid muutusi, mis aitavad määrata inimese isaliini ehk haplogruppi.
See tehnoloogiline lähenemine – k-meride sagedusel põhinev, joondusvaba ja suure ulatusega – avab uusi võimalusi genoomi uurimisel, eriti olukordades, kus andmeid on vähe või kus traditsioonilised meetodid jäävad hätta.
The human genome is complex and constantly changing – mutations occur all the time. Just 25 years ago, studying the genome was slow and expensive, but advances in technology have brought major breakthroughs. In the past, researchers mainly used DNA microarrays, which could detect individual changes called SNPs. Today, it’s possible to sequence the entire genome and analyze billions of data points at once. This study used an innovative approach based on k-mer analysis. K-mers are short DNA fragments (25 letters long), and their frequency in the genome can be calculated without the time-consuming process of comparing all sequences to a reference. This speeds up the analysis and allows researchers to detect changes that older methods often missed – especially in repetitive or technically difficult regions. One of the key innovations in this work is identifying Y chromosome haplogroups using a very small amount of DNA. While traditional methods usually require about 20× coverage for reliable results, this study used less than 1% of randomly selected genome data. This was possible thanks to repetitive sequences on the Y chromosome, which were previously considered too complex to analyze. The method presented in this study uses these repeats as a kind of natural “amplifier,” similar to how DNA is copied in a lab. Over time, these regions have accumulated unique mutations that help identify a person’s paternal lineage, or haplogroup. This technological approach – based on k-mer frequency, alignment-free, and scalable—opens up new possibilities for genome research, especially in cases where only limited data is available or where traditional methods fall short.
The human genome is complex and constantly changing – mutations occur all the time. Just 25 years ago, studying the genome was slow and expensive, but advances in technology have brought major breakthroughs. In the past, researchers mainly used DNA microarrays, which could detect individual changes called SNPs. Today, it’s possible to sequence the entire genome and analyze billions of data points at once. This study used an innovative approach based on k-mer analysis. K-mers are short DNA fragments (25 letters long), and their frequency in the genome can be calculated without the time-consuming process of comparing all sequences to a reference. This speeds up the analysis and allows researchers to detect changes that older methods often missed – especially in repetitive or technically difficult regions. One of the key innovations in this work is identifying Y chromosome haplogroups using a very small amount of DNA. While traditional methods usually require about 20× coverage for reliable results, this study used less than 1% of randomly selected genome data. This was possible thanks to repetitive sequences on the Y chromosome, which were previously considered too complex to analyze. The method presented in this study uses these repeats as a kind of natural “amplifier,” similar to how DNA is copied in a lab. Over time, these regions have accumulated unique mutations that help identify a person’s paternal lineage, or haplogroup. This technological approach – based on k-mer frequency, alignment-free, and scalable—opens up new possibilities for genome research, especially in cases where only limited data is available or where traditional methods fall short.
Kirjeldus
Väitekirja elektrooniline versioon ei sisalda publikatsioone
Märksõnad
doktoritööd