Mustripõhine faktituletus eestikeelsetest tekstidest

Petmanson, Timo

Mustripõhine faktituletus eestikeelsetest tekstidest

dc.contributor.advisor	Laur, Sven	et
dc.contributor.author	Petmanson, Timo	et
dc.contributor.other	Tartu Ülikool. Matemaatika-informaatikateaduskond	et
dc.contributor.other	Tartu Ülikool. Arvutiteaduse instituut	et
dc.date.accessioned	2013-09-09T09:41:19Z
dc.date.available	2013-09-09T09:41:19Z
dc.date.issued	2012	et
dc.description.abstract	Vabatekstide töötlus on üks keerulisemaid probleeme arvutiteaduses. Tekstide täpne analüüs on tihti mitmestimõistetavuse tõttu arvutite jaoks keeruline või võimatu. Sellegipoolest on võimalik teatud fakte eraldada. Käesolevas töös uurime mustripõhiseid meetodeid faktide tuletamiseks eesti keelsetest tekstidest. Rakendame oma metoodikat reaalsetel tekstidel ning analüüsime tulemusi. Kirjeldame lühidalt aktiivõppe metoodikat, mis võimaldab suuri korpuseid kiiremini märgendada. Lisaks oleme implementeerinud prototüüplahenduse korpuste märgendamiseks ning mustripõhise faktituletuse läbiviimiseks.	et
dc.description.abstract	Natural language processing is one of the most difficult problems, since words and language constructions have often ambiguous meaning that cannot be resolved without extensive cultural background. However, some facts are easier to deduce than the others. In this work, we consider unary, binary and ternary relations between the words that can be deduced form a single sentence. The relations represented by sets of patterns are combined with basic machine learning methods, that are used to train and deploy patterns for fact extraction. We also describe the process of active learning, which helps to speed up annotating relations in large corpora. Other contributions include a prototype implementation with plain-text preprocessor, corpus annotator, pattern miner and fact extractor. Additionally, we provide empirical study about the efficiency of the prototype implementation with several relations and corpora.	et
dc.identifier.uri	http://hdl.handle.net/10062/32988
dc.language.iso	en	et
dc.publisher	Tartu Ülikool	et
dc.subject.other	magistritööd	et
dc.subject.other	informaatika	et
dc.subject.other	infotehnoloogia	et
dc.subject.other	informatics	en
dc.subject.other	infotechnology	en
dc.title	Mustripõhine faktituletus eestikeelsetest tekstidest	et
dc.title.alternative	Pattern based fact extraction from Estonian free-texts	et
dc.type	Thesis	et

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis.pdf
Size:: 1.41 MB
Format:: Adobe Portable Document Format

Download

Collections

MTAT magistritööd – Master's theses