Using Generative Models to Combine Static and Sequential Features for Classification
Date
2017-04-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Tänapäeval veedame suure osa oma ajast võrgus. Me suhtleme suhtlusvõrgustikes, ostame asju e-poodides ja haldame pangaülekandeid e-panga kaudu. Tihti on meie tegevused seotud rahaliste teenustega, millega kaasnevad ka riski, et raha varastatakse. Petuskeeme on palju ja nad on pidevas muutumises. Teenusepakkujad üritavad meid finantspettuste eest kaitsta erinevatel viisidel, kuid see pakub suuri väljakutseid. Samas, kuna tegu on võrguteenustega, on võimalik salvestada andmeid, mida saab kasutada pettuste automaatse tuvastamise jaoks. Andmed võivad olla erinevatest allikatest ja erineval kujul. Mõni informatsioon võib olla staatiline, mis ajas ei muutu, ja mõningaid andmeid kogutakse mingi perioodi vältel, ehk nad on jadatunnused. Selleks, et treenida mudelit, mis võimalikult hästi eristab kliente ja pettureid, on oluline kasutada kõiki olemasolevaid andmeid. Petturite kättesaamine on üks näide paljudest erinevatest ülesannetest, mida saab lahendada automaatse klassifitseerimise abil. Käesolevas väitekirjas me uurime, kuidas kasutada selliseid andmetüüpe nagu staatilised ja jadatunnused ning kombineerida neid klassifitseerimise eesmäargil. Me rakendame erinevaid kombineerimisskeeme kolme ülesande puhul erinevatest valdkondadest. Esimene on petturite automaatne tuvastamine. Teine on katseisikute kujutletavate liigutuste ajusignaalide põhjal klassifitseerimine ning kolmas on äriprotsesside lõpptulemuse ennustamine nii varakult kui võimalik. Mida varem me suudame ennustada, et äriprotsess võib lõppeda tõrkega, seda rohkem on aega sekkuda olukorra parandamiseks. Antud töös me näitame, et saame tuvastada pettureid, kasutades selleks ainult 4 kuu andmed, ajusignaalide põhjal eristada 80% täpsusega katseisiku kujutletavaid liigutusi ning varakult - vaid 5 sündmuse realiseerimisel - ennustada äriprotsessi lõpptulemust. Need tulemused demonstreerivad, et meie töös pakutud meetod on potentsiaalselt kasulik ka teistes valdkondades klassifitseerimisprobleemide lahendamiseks
Nowadays, major part of our daily activities takes place online, whether we chat in social networks, do shopping, manage our bank accounts. Often such online activities are accompanied by financial transactions, where the suspicious activity is often present. Providers of the services try their best to protect their clients, but it is a challenging task as fraudulent users come up with new schemes and change their strategy. Most of these online activities can be recorded. This data can be used to automate the procedure of fraud detection. Data come from different sources and in different form. Some data include static attributes that do not change over time; some data are sequential, meaning that they capture client behavior over time. In order to build a model that automatically discriminates between clients and fraudulent users we want to incorporate all of the available data in a way that improves the detection. Capturing fraudulent activity is just one example out of the wide variety of problems that can be solved with automatic classification technique. In this thesis we investigate how to use different types of data, such as sequential and static attributes, and fuse them together to improve the classification. We apply various data fusion strategies on three tasks. One is the fraudulent user detection problem, while second is the discrimination between imaginary movements of patients using their brain activity signals. The third problem is early prediction of the outcome of the business processes, where the earlier we are able to predict whether the business process ends up being a failure, the better the chances are to intervene timely and change an undesired outcome. The results in our work suggest that the developed approach compliments the existing techniques and it can be useful for other realworld problems.
Nowadays, major part of our daily activities takes place online, whether we chat in social networks, do shopping, manage our bank accounts. Often such online activities are accompanied by financial transactions, where the suspicious activity is often present. Providers of the services try their best to protect their clients, but it is a challenging task as fraudulent users come up with new schemes and change their strategy. Most of these online activities can be recorded. This data can be used to automate the procedure of fraud detection. Data come from different sources and in different form. Some data include static attributes that do not change over time; some data are sequential, meaning that they capture client behavior over time. In order to build a model that automatically discriminates between clients and fraudulent users we want to incorporate all of the available data in a way that improves the detection. Capturing fraudulent activity is just one example out of the wide variety of problems that can be solved with automatic classification technique. In this thesis we investigate how to use different types of data, such as sequential and static attributes, and fuse them together to improve the classification. We apply various data fusion strategies on three tasks. One is the fraudulent user detection problem, while second is the discrimination between imaginary movements of patients using their brain activity signals. The third problem is early prediction of the outcome of the business processes, where the earlier we are able to predict whether the business process ends up being a failure, the better the chances are to intervene timely and change an undesired outcome. The results in our work suggest that the developed approach compliments the existing techniques and it can be useful for other realworld problems.
Description
Keywords
matemaatilised mudelid, tehisõpe, klassifitseerimine, mustriotsing, äriprotsesside modelleerimine, juhtumiuuringud, mathematical models, automatic learning, classification, pattern search, business process modeling, case studies