Paralleelsed Wilcoxoni Astaku testid

Date

2014

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Statistilisi teste kasutatakse tuvastamaks kuidas erinevad eksperimentaalsed stiimulid mõjutavad uuritavaid suurusi. Antud töös uuritav Wilcoxoni test on üks vähestest statilistest testidest, mida saab kasutada juhul kui grupi sees olev loomulik varieeruvus pole normaaljaotusega. Selliste testide kasutamine on tavapärane bioloogiliste andmete uurimisel. Sellest tulenevalt kasutatakse seda testi bioinformaatika, algoritmika ja andmekaeve grupi poolt geenide uurimiseks, analüüsimiseks ja andmekaeveks, bioloogiliseks andmekaeveks ja muudeks ülesanneteks. Praegused implementatsioonid Wilcoxoni astaku testist on optimeerimata ja aeglased. See projekt vaatab Wilcoxoni testi põhiomadusi ning uurib kuidas selle implementatsioone optimeerida. Selleks, et implementatsiooni teha täpsemaks, uuritakse Wilcoxoni statistiku ja Gaussi jaotuse seost. Selleks, et implementatsioone kiiremaks teha, kasutatakse dünaamilise programmeerimise meetodeid, et säästa arvutusaega. Optimeerimisega tehti teste nii kiiremaks kui ka täpsemaks. Antud töös loodi täpne ja kiire Wilcoxoni testi implementatsioon C++ jagatud teek. Selle projekti skoobis on ka nimetatud teegi integreerimine käsureaga ja GNU-R projektiga. Tänu enda jagatud teegi olemusele, on seda lihtne kasutada ja implementeerida ka teistes tööriistades.
Statistical tests are used to find out if some sort of experimental stimulation affects observable features. In this paper we researched Wilcoxon signed-rank test which is one of the few statistical tests that can be used when the natural variation inside the group is not normally distributed. The test is used by Bioinformatics, Algorithmics and Data mining group research for gene regulation, gene expression data analysis, biological data mining and others. BIIT is a joint research group between the Department of Computer Science (University of Tartu), Quretec, and the Estonian Biocenter. The current implementations of the Wilcoxon signed-rank tests are slow and unoptimized. This project looked into the foundations of Wilcoxon signed-rank test, its current implementations and how to optimize it. In order to make the implementation more accurate, the relationship between Wilcoxon statistic and Gaussian approximate was observed. In order to make the implementation faster, some dynamic programming methods were used to save computation time. The purpose of optimizing was to make it more accurate and speed up the test running. In this project an accurate and fast Wilcoxon test shared library was created. In the scope of this project, the library was integrated with two tools - command line and GNU-R. Due to the nature of shared library, it will be easy integrate the library with any other tools one might desire.

Description

Keywords

Citation