Automaatne andmetepõhine andmebaasi skeemade genereerimine

Edenberg, Joel

Automaatne andmetepõhine andmebaasi skeemade genereerimine

dc.contributor.advisor	Tretjakov, Konstantin	et
dc.contributor.author	Edenberg, Joel	et
dc.contributor.other	Tartu Ülikool. Matemaatika-informaatikateaduskond	et
dc.contributor.other	Tartu Ülikool. Arvutiteaduse instituut	et
dc.date.accessioned	2013-09-09T09:44:22Z
dc.date.available	2013-09-09T09:44:22Z
dc.date.issued	2012	et
dc.description.abstract	Antud magistritöö eesmärk oli uurida võimalusi andmebaasi mudelite automaatseks genereerimiseks ning pakkuda välja ka võimalik lahendus antud probleemile. Kuna andmebaaside loomine on tänapäeval infosüsteemide lahutamatu osa ning skeemad on vähemalt osaliselt tihti sarnased, siis oleks mõistlik antud tegevus automatiseerida. Töös vaadeldi skeemade automaatse genereerimise olemust ning arutleti võimalike lahendusmeetodite üle. Antud probleemi lahendamiseks esitleti ühte konkreetset tõenäosuslikul lähenemisel põhinevat algoritmi. Lisaks abstraktsele algoritmi töökirjeldusele toodi välja ka üks võimalik realisatsioon, koos seletuste ning põhjendustega. Algoritm realiseeriti programmeerimiskeeles Python ning loodi ka graafiline kasutajaliides. Samuti arutleti alternatiivsete lahendusmeetodite üle - vaadeldes nii teistsuguseid lähenemisi kui ka võimalusi väljapakutud algoritmi parendamiseks. Töö viimases osas anti hinnang väljapakutud algoritmi tulemustele, võrreldes seda mõnede alternatiivsete lahendustega. Töö käigus selgus, et kuigi tõenäosuslike vastuste genereerimine annab rahuldavaid tulemusi, on sellel siiski ka puudujääke. Õnneks on enamus tekkinud probleeme võimalik lahendada kasutades erinevaid masinõppe lähenemisi. Kõige enam valmistasid raskusi SQL skeema definitsioonifailide süntaksi analüüsimine ning skeemade genereerimise algoritmi üldkonseptsiooni väljamõtlemine. Huvitavaks tegi antud töö asjaolu, et suuresti puudus varasem uurimustöö, millele toetuda ning paljudele probleemidele tuli ise jooksvalt lahendusi leida. Kokkuvõtvalt arvan, et tööle püstitatud eesmärgid said täidetud ning skeemade automaatne genereerimine esmasel kujul ka realiseeritud. Väljapakutud lahendus pole aga kindlasti mitte ammendav ning seda oleks võimalik tulevikus edasi arendada.	et
dc.description.abstract	The goal of this thesis was to study the possibilities for automatically generating database schemas and to implement a proof-of-concept. All information systems contain some sort of means to store information - often a relational database is used. In order to store the adequate data, we need a suitable database schema. As it turns out similar software applications also share at least partly similar database schemas. So it should be theoretically feasible to generate schemas, or parts of them, automatically. In the first chapters of the thesis we discuss some of the general possible approaches. We proposed a novel algorithm for solving the task of generating schemas. In order to find the most common or most probable solution the proposed algorithm uses a probabilistic model. The algorithm is given a partial list of table names, desired in the resulting schema. These table names represent the data objects user wants to store. In essence the given table names tell the algorithm what data needs to be saved and leaves it up to the program to compose the entire solution. We created one possible implementation of the proposed algorithm (written in Python). Our proposed prototype takes heavy usage of dialogue-like interaction with user. A graphical user interface was also made in order to enhance the working experience and ease the tuning of the algorithm. As the user is the only one who is fully knowledgeable of the requirements, we left several configuration parameters up for fine tuning by user. In addition to given table names user can also determine how many additional tables would be needed (tables that algorithm could find also relevant and append to the solution), should we use database foreign keys too for finding relative additional tables, how many columns do we want in each table and how specific should the column definitions be to the current schema. Next we discussed alternatives solutions, potential improvements and possibilities for future research. In the last part of the thesis we experimentally assessed the perfomance of our algorithm and compared several variations of it. We introduced a novel similarity measure between two schemas in order to estimate the quality of the answers. Due to some specifics of the chosen knowledge base (database containing data about schema examples) our algorithm turned out not to be better than a naive first-match search. However, we believe that in practice using the probabilistic algorithm yields to better results.	et
dc.identifier.uri	http://hdl.handle.net/10062/33043
dc.language.iso	et	et
dc.publisher	Tartu Ülikool	et
dc.subject.other	magistritööd	et
dc.subject.other	informaatika	et
dc.subject.other	infotehnoloogia	et
dc.subject.other	informatics	en
dc.subject.other	infotechnology	en
dc.title	Automaatne andmetepõhine andmebaasi skeemade genereerimine	et
dc.title.alternative	Automatic data-driven generation of database schemas	et
dc.type	Thesis	et

Failid

Originaal pakett

Nüüd näidatakse 1 - 2 2

Nimi:: thesis.pdf
Suurus:: 1.83 MB
Formaat:: Adobe Portable Document Format

Lae alla

Nimi:: extra.rar
Suurus:: 10.32 MB
Formaat:: Unknown data format

Lae alla

Kollektsioonid

LTAT magistritööd – Master's theses