Enhanced Speech Emotion Recognition Using Averaged Valence Arousal Dominance Mapping and Deep Neural Networks
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
This thesis delves into advancements in speech emotion recognition (SER) by establish ing a novel approach for emotion mapping and prediction using the Valence-Arousal Dominance (VAD) model. Central to this research is the creation of reliable emotion to-VAD mappings, achieved by averaging outcomes from multiple pre-trained networks
applied to the RAVDESS dataset. This approach adeptly resolves prior inconsistencies in
emotion-to-VAD mappings and establishes a dependable framework for SER. The study
also introduces a refined SER model, integrating the pre-trained Wav2Vec 2.0 with Long
Short-Term Memory (LSTM) networks and linear layers, culminating in an output layer
representing valence, arousal, and dominance. Notably, this model exhibits commendable
accuracy across various datasets, such as RAVDESS, EMO-DB, CREMA-D, and TESS,
thereby showcasing its robustness and adaptability, an improvement over earlier models
susceptible to dataset-specific overfitting.
The research further unveils a comprehensive speech analysis application, adept at
denoising, segmenting, and profiling emotions in speech segments. This application
features interactive emotion tracking and sentiment reports, illustrating its practicality
in diverse applications. The study recognizes ongoing challenges in SER, especially in
managing the subjective nature of emotion perception and integrating multimodal data.
Although the research marks a progression in SER technology, it underscores the need
for continuous research and careful consideration of ethical aspects in deploying such
technologies. This thesis contributes to the SER domain by introducing a dependable
method for emotion to VAD mapping, a robust model for emotion recognition, and a
user-friendly application for practical implementations.
Description
Keywords
Speech Emotion Recognition, Deep Neural Networks, LSTM, Speech Analysis, Valence, Arousal, Dominance