Automatic Speech-based Emotion Recognition
The main objectives of affective computing is the study and creation of computer systems which can detect human affects. For speech-based emotion recognition, universal features offering the best performance for all languages have not yet been found. In this thesis, a speech-based emotion recognition system using a novel set of features is created. Support vector machines are used as classifiers in the offline system on Surrey Audio-Visual Expressed Emotion database, Berlin Database of Emotional Speech, Polish Emotional Speech database and Serbian emotional speech database. Average emotion recognition rates of 80.21%, 88.6%, 75.42% and 93.41% are achieved, respectively, with a total number of 87 features. The online system, which uses Random Forests as it’s classifier, consists of two models trained on reduced versions of the first and second database, with the first model trained on only male samples and the second trained on both. The main purpose of the online system was to test the features’ usability in real-life scenarios and to explore the effects of gender in speech-based emotion recognition. To test the online system, two female and two male non-native English speakers recorded emotionally spoken sentences and used these as inputs to the trained model. Averaging over all emotions and speakers per model, it is seen that the features offer better performance than random guessing, achieving 28% emotion recognition in both models. The average recognition rate for female speakers was 19% in the first and 29% in the second model. For male speakers, the rates were 36% and 28%, respectively. These results show how having more samples for training for a particular gender affects emotion recognition rates in a trained model.
The following license files are associated with this item: