Paralleelne Mustriotsing

Date

2013

Journal Title

Journal ISSN

Volume Title

Publisher

Tartu Ülikool

Abstract

Üks huvitav uurimisprobleem andmete analüüsimisel on mustriotsing. Mustrid võivad näidata kuidas andmed on tekkinud ja kuidas ta ennast kordab. Andmete mahu kiire kasvamise tõttu on vajadus algoritmidele, mis skaleeruvad mitmele protsessile. Selles töös me uurime kuidas paralleliseerida olemasolevat algoritmi kasutades kolme ideed: üldistamine, liigendamine ja reifitseerimine. Me rakendame neid ideid SPEXS-il, mustriotsingu algoritm, ning tuletame paralleelse algoritmi SPEXS2, mille me ka implementeerime. Lisaks me uurime probleeme, mis tekkisid selle algoritmi implementeerimisel. Selles töös tutvustatud ideid saab kasutada teiste algoritmide üldistamisel ning paralleliseerimisel.
An interesting research problem in dataset analysis is the discovery of patterns. Patterns can show how the dataset was formed and how it repeats itself. Due to the fast growth of data collection there is a need for algorithms that can scale with the data. In this thesis we examine how we can take an existing algorithm and make it parallel with three ideas: generalization, decomposition and reification of the existing algorithm. We apply these ideas to SPEXS, a pattern discovery algorithm, and generate a new algorithm SPEXS2, which we also implement. We also analyze several problems when implementing a generic algorithm. The ideas described could be used to parallelize other algorithms as well.

Description

Keywords

Citation