Elektronkirjade klassifitseerimine
Date
2011
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Tänapäeval on elektronpost üks enimkasutatud rakendusi, mis arvuti jaoks on läbi
aegade leiutatud. Kuna saadetavate ekirjade hulk kasvab kiiresti oleme me aina enam
seismas silmitsi probleemiga, kus infot tuleb liiga palju ja selle hulgast vajaliku
leidmine muutub üha raskemaks. Antud töö eesmärk on anda ülevaade erinevatest
klassifitseerimismeetoditest ja võimalustest antud probleemi lahendada läbi ekirjade
klassifitseerimise.
Antud töö annab ülevaate erinevatest klassifitseerimismeetoditest, võtmesõnade ja
võtmefraaside leidmisest ning sellest, kuidas tekstist leitud informatsiooni erinevatesse
klassidesse jagada.
Samuti tutvustab lühidalt elektronkirja formaati, annab ülevaate, milliseid
programme kasutatakse enim elektronkirjade lugemiseks ning toob välja statistika
saadetud elektronkirjade hulga kohta aastas. Samuti tutvustab põgusalt suurest ekirjade
hulgast põhjustatud infokülluse probleemi.
Töö lõpus viiakse läbi ka reaalne katse kasutades meililugemisprogramme –
Microsoft Outlook ja Mozilla Thunderbird – ja neisse sisseehitatud kirjade
klassifitseerimise funktsionaalsust. Katse tulemusena võib öelda, et mõlema
meiliprogrammi vastav funktsionaalsus töötab hästi ja on kasutajale igapäevaselt
suureks abiks, et hoida saabuvad kirjad kontrolli all ja klassifitseerida neid vastavalt
kasutaja soovile, et seeläbi lihtsustada vajaliku info leidmist.
Today email is one of the most widely used communication methods. It has been used for decades by now and is used daily by organizations as well as by individuals to forward and receive all kind of information. Considering this the amount of email messages sent and received has grown significantly and more than before we are seriously facing a message overload problem. To make managing and finding messages easier it is reasonable to classify messages based on user needs. The specific way for classifying emails can be developed by every person just the way it is reasonable for the specific user. An electronic message or in short email consists of two parts: the message body (email content) and the message header. By using information from there I will try to classify email messages to make it easier to find and manage both incoming and existing emails. This thesis aims to give an overview of what classification is and introduce some common classification methods. Another aim is to briefly introduce email format and message overload problem and to take a look at the number of emails sent yearly. Last aim is to study different built-in features for widely used email programs to see if these features are useful for classifying emails to make finding information faster and easier. This thesis is divided into 3 chapters. The first chapter gives an overview of email message format, the message overload problem, widely used email clients, and the amount of emails sent. In chapter two some classification methods, information extraction, categorization and classification are introduced. In chapter three some real life experiments are conducted to show how to use email clients to classify email messages.
Today email is one of the most widely used communication methods. It has been used for decades by now and is used daily by organizations as well as by individuals to forward and receive all kind of information. Considering this the amount of email messages sent and received has grown significantly and more than before we are seriously facing a message overload problem. To make managing and finding messages easier it is reasonable to classify messages based on user needs. The specific way for classifying emails can be developed by every person just the way it is reasonable for the specific user. An electronic message or in short email consists of two parts: the message body (email content) and the message header. By using information from there I will try to classify email messages to make it easier to find and manage both incoming and existing emails. This thesis aims to give an overview of what classification is and introduce some common classification methods. Another aim is to briefly introduce email format and message overload problem and to take a look at the number of emails sent yearly. Last aim is to study different built-in features for widely used email programs to see if these features are useful for classifying emails to make finding information faster and easier. This thesis is divided into 3 chapters. The first chapter gives an overview of email message format, the message overload problem, widely used email clients, and the amount of emails sent. In chapter two some classification methods, information extraction, categorization and classification are introduced. In chapter three some real life experiments are conducted to show how to use email clients to classify email messages.