Motiivide leidmine lühikestest peptiididest

Date

2013

Journal Title

Journal ISSN

Volume Title

Publisher

Tartu Ülikool

Abstract

Käesoleva töö eesmärgiks on arendada töövoog, mis leiaks etteantud lühikestest peptiididest sarnaste peptiidide grupid ning esitaks need grupid motiividena. Sellist töövoogu oleks hiljem võimalik kasutada motiivide avastamiseks erinevate indiviidide peptiididest, et leida sarnasusi sama diagnoosiga haigete vahel. Peptiididest motiivide leidmise töövoo koostamiseks kombineeritakse erinevaid üldtuntud meetodeid, bioinformaatika tööriistu ning lisaskripte. Koostatud töövoog põhineb hierarhilisel klasterdamisel, mille abil jagatakse etteantud peptiidid sarnasuse alusel gruppidesse. Leitud gruppe modifitseeritakse, et koostada just sellised grupid, millest igaüks sisaldaks ühte unikaalset motiivi. Lõplikest gruppidest leitakse motiivid, mis visualiseeritakse logodena ning esitatakse ka regulaaravaldise kujul. Leitud motiividele lisatakse skoorid, mis annaksid infot selle kohta, kui hästi iga motiiv just oma peptiidigruppi kirjeldab. Valminud töövoog koostati ning rakendati ühe testindiviidi peal. Töövoo rakendamine oli edukas ning etteantud 277 166 peptiidist suudeti 71.19% jagada 46 motiivigruppi, millest 43 said ka väga head skoorid. Selle töövoo abil on võimalik edaspidi analüüsida erinevaid indiviide, et leida sama diagnoosiga haigetel ühiseid motiive.
The goal of this thesis is to develop a workflow that could find groups of similar peptides from a set of short peptides and represent these groups as motifs. This workflow could be later used to discover motifs from peptides of different individuals to find similarities between individuals with the same disease. Different commonly known methods, bioinformatics tools and additional scripts are combined to assemble the workflow of finding motifs from the peptides. The developed workflow is based on hierarchical clustering, which divides the input peptides into groups based on their similarities. The found groups are modified to get groups that each would contain only one unique motif. Motifs of the final groups are then extracted and represented as sequence logos and regular expressions. The found motifs are scored to give information about how well every motif describes specifically that peptide group. The developed workflow was assembled and tested on one individual. The testing was successful and 71.19% of the inserted 277 166 peptides were divided into 46 motif groups, of which 43 had very good scores. In the future, this workflow can be used to analyze different individuals in order to find similar motifs between individuals with the same disease.

Description

Keywords

Citation