Distribution and phylogeny of the bacterial translational GTPases and the Mqsr/YgiT regulatory system
Kuupäev
2013-08-30
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Abstrakt
Valgud on raku ehituskivideks ja eluks vajalike reaktsioonide katalüüsijateks. Bioinformaatika on meid varustanud võimsate järjestuste analüüsi vahenditega. Järjestuse sarnasuse alusel grupeeruvad valgud perekondadeks. Valguperekonna moodustavad homoloogsed järjestused ehk siis järjestused, mis pärinevad samast eellasjärjestusest. Tihti omavad samasse perekonda kuuluvad valgud ka sama või üksteisele lähedast funktsiooni. Meie teadmised valkude funktsioonidest pärinevad üksikutelt mudelorganismidelt. Tihti huvitab teadlasi kui universaalne või spetsiifiline on üks või teine kirjeldatud funktsioon. Kuidas ja millal evolutsiooni käigus tekib olemasolevast materjalist uute omadustega (uue funktsiooniga) valk läbi geeniduplikatsiooni? Kui tihti on sellised sündmused evolutsioonilises ajaskaalas aset leidud?
Oma töös olen ma analüüsinud bakterite translatsioonilisi GTPaase (trGTPaas) ja mqsR/ygiT toksiin-antitoksiin (TA) süsteemi valke. Ühiseks nime¬¬tajaks mõlemale on valgusünteesi aparaat – mõlemad on seotud ribosoomiga ja sealtkaudu raku võimega sõltuvalt vajadusele toota valke.
Küsimused, mida selles kontekstis on küsitud, saab laias laastus jagada kaheks: a) valguperekonna esindatusega seotud ja b) valguperekonna evolutsiooni ja funktsionaalse innovatsiooniga seotud. Translatsiooniliste GTPaaside puhul bakterites saame rääkida üheksast erinevast perekonnast – üheksast erinevast funktsioonide komplektist. Täisgenoomidele põhinev analüüs näitas, et üheksast trGTPaaside perekonnast on bakterites konserveerunud neli: IF2, EF-Tu, EFG ja LepA(EF4). Vaatamata sellele, et RF3’e on omistatud klassikalise valgusünteesi mudeli valguses kanooniline roll translatsiooni lõpetamisel, puudus RF3 geen ligikaudu 40% analüüsitud bakteri genoomides. Samas aga ebaselge funktsiooniga LepA osutus bakterite spetsiifiliseks trGTPaasiks.
Eelnev analüüs tõi ka välja EFG paraloogide laia esinemise – paljud bakteri¬genoomid sisaldasid 2–3 üksteisest küllaltki erinevat (divergeerunud) EFG geeni. Lähem analüüs tõi välja, et kogu varieeruvuse EFG perekonnas võib jagada neljaks alamperekonnaks: EFG I, spdEFG1, spdEFG2 ja EFG II. Eksperimentaalselt on hästi iseloomustatud EFG I. Uuritud on ka spdEFG’sid ja leitud, et esimene neist omab translokaasi aktiivsust translatsioonil ja teine osaleb ribosoomide retsükleerimisel. Laialt levinud EFG II alamperekond on aga halvasti uuritud. Fülogeneetiline analüüs võimaldab püstitada hüpoteesi nelja EFG alamperekonna iidsest päritolust, st. nad on tekkinud ajalises skaalas enne (või samaaegselt) eukarüootse rakuvormi lahknemist arhedest ja bakteritest. Funktsionaalse innovatsiooni kandjaks EFG II valgus võib pidada eelkõige 12 positsiooni, mis on spetsiifiliselt konserveerunud just EFG II alamperekonnal. EFG II’e iseloomulikus kõrge divergentsuse taustal tõusevad need positsioonid esile GTPaasi domäänis, domäänis II ja neljandas domäänis. Konserveerunud muutused GTPaasi domäänis, millest osad on GTP’d siduvas G1 motiivis, võimaldavad teha järeldusi muutunud GTP sidumise ja hüdrolüüsi tingimuste kohta. Suurenenud laeng neljanda domääni lingu otsas, mis E. coli EFG’l siseneb A-saiti, võimaldab spekuleerida muutuse üle translokatsiooni keskkonnas. Konserveerunud muutused domään II piirkonnas viitavad muutunud interaktsioonile ribosoomi, domään I ja domään III vahel.
EFG II alamperekonna fülogeneetiline ja järjestuste analüüs näitab selgelt hõimkonna/klassi spetsiifiliste alam-alamgruppide olemasolu. Need alam-alamgrupid erinevad teineteisest G2 motiivi konserveeruvuse ja insertsioonide/deletsioonide mustri alusel. See teine tase kirjeldab EFG II kui hõimkonna/klassi spetsiifilist faktorit.
Mis on EFG II roll tegelikult ja kuidas ning millistes tingimustes ta komplementeerib EFG I, ootab alles vastuseid. Antud töö on loonud raamistiku tulevaste eksperimentide tarvis.
Proteins are vital for the cell – they serve as building blocks and catalysts for many different reactions. Bioinformatics has equipped us with powerful analysis tools. According to sequence similarity, proteins can be grouped into families. Protein family is composed of homologous sequences, i. e. from sequences, which share a common ancestor. Proteins, which belong to the same family, perform their function in a similar way. Our knowledge about functional properties of proteins originates from experimental works performed with a limited number of model organisms. Scientists are often interested in the universality or specificity of one or another described protein and function. How often is gene duplication and following innovation the source for genes/proteins with a new function? How often such events take place in the evolutionary timescale? In my dissertation I have analyzed gene and protein sequences of translational GTPases (trGTPases) and mqsR/ygiT toxin-antitoxin of bacteria. Common denominator for both protein families is their connection to cells protein synthesis machinery. Two types of questions can be asked in this context: those that are related to a) the representation of specific proteins/function, and b) the evolution and functional innovation. In the case of trGTPases nine different protein families, i. e. presence or absence of nine different functional complexes in the cell were described. Analyzes carried on completed genome sequences of bacteria revealed four conserved families: IF2, EF-Tu, EFG, and LepA(EF4). Despite the fact that in the classical model of protein synthesis RF3 carries canonic role at the final step of translation, RF3 coding gene was found missing approximately in 40% of analyzed bacteria. Surprisingly, LepA, whose function is still not well understood, appears to be specific trGTPase for bacteria. The analysis also revealed a wide distribution of EFG paralogs – many bacteria contained two to three relatively diverged gene copies for EFG. The phylogenetic tree of EFG revealed four subfamilies: EFG I, spdEFG1, spdEFG2, and EFG II. The EFG I subfamily is experimentally well characterized. Also, spdEFG1 was found to act as translocase and spdEFG2 helps recycle ribosome, indicating functional split between co-occurring paralogs. However, little research has been done on widely distributed EFG II subfamily. Phylogenetic analyses, performed by us, enable to propose hypothesis about ancient origin of EFG subfamilies - they have appeared at the same timescale with (or even before) arousing eukaryotic life-forms. Functional innovation, common for the whole subfamily, is carried by 12 EFG II specific positions. In contrast to overall high divergeny, these conserved positions have spotlighted in the GTPase domain, and in the domain II and IV. Conserved changes in the GTPase domain, some of which are located in the G1 motif, indicate changed conditions in GTP binding and hydrolysis. Increased charge in protruding loop of the fourth domain, which inserts into A-site, enables us to speculate about changes in the local conditions of the A-site during translocation. Conserved changes in the domain II indicate changed interaction between EFG domains I, II, and III and the ribosome. Phylogenetic analysis of the EFG II subfamily reveals phyla/class specific sub-subgroups. These sub-subgroups differ from each other by conserved amino acids pattern of the G2 motif and insertion/deletion pattern detected from multiple sequence alignment. This another level characterizes EFG II as phyla/class specific factor. Further research should be conducted on what role EFG II actually performs and how it complements EFG I. Current study can serve as framework for future experiments.
Proteins are vital for the cell – they serve as building blocks and catalysts for many different reactions. Bioinformatics has equipped us with powerful analysis tools. According to sequence similarity, proteins can be grouped into families. Protein family is composed of homologous sequences, i. e. from sequences, which share a common ancestor. Proteins, which belong to the same family, perform their function in a similar way. Our knowledge about functional properties of proteins originates from experimental works performed with a limited number of model organisms. Scientists are often interested in the universality or specificity of one or another described protein and function. How often is gene duplication and following innovation the source for genes/proteins with a new function? How often such events take place in the evolutionary timescale? In my dissertation I have analyzed gene and protein sequences of translational GTPases (trGTPases) and mqsR/ygiT toxin-antitoxin of bacteria. Common denominator for both protein families is their connection to cells protein synthesis machinery. Two types of questions can be asked in this context: those that are related to a) the representation of specific proteins/function, and b) the evolution and functional innovation. In the case of trGTPases nine different protein families, i. e. presence or absence of nine different functional complexes in the cell were described. Analyzes carried on completed genome sequences of bacteria revealed four conserved families: IF2, EF-Tu, EFG, and LepA(EF4). Despite the fact that in the classical model of protein synthesis RF3 carries canonic role at the final step of translation, RF3 coding gene was found missing approximately in 40% of analyzed bacteria. Surprisingly, LepA, whose function is still not well understood, appears to be specific trGTPase for bacteria. The analysis also revealed a wide distribution of EFG paralogs – many bacteria contained two to three relatively diverged gene copies for EFG. The phylogenetic tree of EFG revealed four subfamilies: EFG I, spdEFG1, spdEFG2, and EFG II. The EFG I subfamily is experimentally well characterized. Also, spdEFG1 was found to act as translocase and spdEFG2 helps recycle ribosome, indicating functional split between co-occurring paralogs. However, little research has been done on widely distributed EFG II subfamily. Phylogenetic analyses, performed by us, enable to propose hypothesis about ancient origin of EFG subfamilies - they have appeared at the same timescale with (or even before) arousing eukaryotic life-forms. Functional innovation, common for the whole subfamily, is carried by 12 EFG II specific positions. In contrast to overall high divergeny, these conserved positions have spotlighted in the GTPase domain, and in the domain II and IV. Conserved changes in the GTPase domain, some of which are located in the G1 motif, indicate changed conditions in GTP binding and hydrolysis. Increased charge in protruding loop of the fourth domain, which inserts into A-site, enables us to speculate about changes in the local conditions of the A-site during translocation. Conserved changes in the domain II indicate changed interaction between EFG domains I, II, and III and the ribosome. Phylogenetic analysis of the EFG II subfamily reveals phyla/class specific sub-subgroups. These sub-subgroups differ from each other by conserved amino acids pattern of the G2 motif and insertion/deletion pattern detected from multiple sequence alignment. This another level characterizes EFG II as phyla/class specific factor. Further research should be conducted on what role EFG II actually performs and how it complements EFG I. Current study can serve as framework for future experiments.
Kirjeldus
Väitekirja elektrooniline versioon ei sisalda publikatsioone.
Märksõnad
bakterigeneetika, GTP, fosfataasid, translatsioon (biol.), fülogenees, bacterial genetics, GTP, phosphatases, translation (biol.), phytogeny