Sirvi Märksõna "Classification" järgi

Nüüd näidatakse 1 - 4 4

listelement.badge.access-status Avatud juurdepääs ,
A Web Application Supporting the Full Pipeline of Business Process Deviance Analysis
(Tartu Ülikool, 2021) Yusifov, Sabuhi; Maggi, Fabrizio Maria, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
In business process mining, the deviant cases refer to the unusual cases in the process execution flow. Depending on their performance and outcomes, processes can deviate in negative ways (for example a delivery process that takes too much time) or positive ways (for example, a healthcare process in which a patient recovered very quickly). Business process deviance mining is the task of exploring the reasons behind exceptional cases in business process logs. In this thesis, we introduce a web application built on top of existing work concerning the problem of explaining deviant cases using sequential or declarative process patterns characterizing the cases, or a combination of them. While the existing work provided most of the backend of the application, we developed a web application on top of it to guide the process analyst in the deviance mining task throughout the entire analysis pipeline from log splitting, to case labeling, to the application of classifiers to extract deviance explanations in terms of process patterns. The development and design of our application bases on a set of requirements acquired from BPM experts. In this thesis, we will first present the requirements, then we will walk through how each requirement is fulfilled by our implementation by creating test cases for each specific requirement.
listelement.badge.access-status Avatud juurdepääs ,
Machine learning for text classification in classical cryptography
(Tartu University Library, 2025) Foxon, Floe; Antal, Eugen; Marák, Pavol
This study furthers previous work on text classification to distinguish between ciphertext and gibberish. The statistical/linguistic properties of four text types were studied: meaningful English text, and three gibberish types (n=1,250 each; total N=5,000). Dimension reduction techniques (PCA, t-SNE, and UMAP) were used to reduce the statistical/linguistic feature space of the texts to two dimensions, revealing distinct regions of (lower dimensional) feature space occupied by each text, with some overlap. Machine learning models including random forests, neural networks (NNs), and support vector machines (SVMs) were used to classify the four text types based on their statistical/linguistic properties. Nested cross-validation revealed better generalization performance for the NNs and SVMs, classifying texts with >90% accuracy. Applied to the Dorabella cryptogram, the models suggest that this text resembles meaningful English text more closely than gibberish types, which comports with the Dorabella cryptogram as a monoalphabetic substitution cipher, but this classification should be interpreted with caution. Features that better separate meaningful English from English-like gibberish are needed, and other encryption schemes/cryptograms should be explored with these methods.
listelement.badge.access-status Avatud juurdepääs ,
Prediction of a movie’s box office using pre-release data
(Tartu Ülikool, 2020) Bondarenko, Stanislav; Sharma, Rajesh, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
It’s difficult to overestimate the impact of the film industry in our lives, it expands our knowledge about the world and culture and entertains. Going to the cinema has become an important leisure activity. Moreover, the total worldwide box office in 2018 hit a significant amount of $41B. This is not surprising as only in 2018 there were released 11,911 feature-length films worldwide. The box office generated from cinema ticket sales is the main source of profit for widely released movies. However, not all movies are successful in terms of profit when the cost of production is compared with the total box office. 78% of movies released worldwide are not profitable and 35% of profitable movies earn 80% of the total profit. Seeing the importance of theatrical screenplays and tough competition for the profit made, we want to be able to predict how successful a movie is going to be and whether it is worth taking the risk of investment. Only pre-release available data is used to be able to make a prediction at the earliest stages. We went through several stages typical for data mining and machine learning to obtain possibly the biggest and feature-rich dataset used in box office gross prediction. We use neural networks and gradient boosting machines to be able to predict the absolute box office gross, predict within which range it is likely to be, and whether a movie will be profitable, and the results obtained are very competitive in the domain.
listelement.badge.access-status Avatud juurdepääs ,
Vowel Classification from Imagined Speech Using Machine Learning
(Tartu Ülikool, 2020) Tamm, Markus-Oliver; Muhammad, Yar, juhendaja; Muhammad, Naveed, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
Imagined speech is a relatively new EEG neuro-paradigm, which has seen little use in BCI applications. Imagined speech can be used to allow physically impaired patients to communicate and to use smart devices by imagining desired commands and then detecting and executing those commands in a smart device. The goal of this research is to verify previous classification attempts made and then design a new, more efficient neural network that is noticeably less complex (fewer number of layers) that still achieves a comparable classification accuracy. The classifiers are designed to distinguish between EEG signal patterns corresponding to imagined speech of different vowels and words. This research uses a dataset that consists of 15 subjects imagining saying the 5 main vowels (a, e, i, o, u) and 6 different words 2 previous researches on imagined speech classification done on this same dataset are replicated and the replication results are compared. The pre-processing of data is described and a new CNN classifier with 3 different Transfer Learning methods are described and used to classify EEG signals. Classification accuracy is used as the performance metric.