Estimation of depression level from text: symptom-based approach, external knowledge, dataset validity
Kuupäev
2024-11-26
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu Ülikooli Kirjastus
Tartu Ülikool
University of Tartu
Tartu Ülikool
University of Tartu
Abstrakt
Depressioon on üks levinumaid vaimse tervise häireid kogu maailmas, mis võib põhjustada funktsioneerimise langust ja suurendada suitsiidiriski. Hiljutine COVID-19 pandeemia on tõstnud depressiooni sagedust kogu maailmas. Lisaks takistavad stigma ja piiratud juurdepääs ravile paljudel inimestel õige diagnoosi saamist ja ravi.
Varasemad uuringud on leidnud, et depressioonis inimeste sõnavara erineb ilma depressioonita inimeste omast. Näiteks kalduvad depressioonis inimesed kasutama rohkem negatiivseid või emotsionaalseid sõnu. Viimasel ajal on välja töötatud tehisnärvivõrgumudeleid, mis suudavad teksti alusel depressiooni esinemist tuvastada. Siiski on enamik teadlasi käsitlenud depressiooni tuvastamist lihtsa klassifitseerimisülesandena, kus on ainult kaks võimalikku väljundit: kas on depressioon või mitte. Samas võivad kahel depressioonis inimesel ilmneda erinevad sümptomid. Näiteks ühel inimesel võib esineda unetus ja keskendumisraskused, samas kui teine võib kannatada isumuutuste ja madala enesehinnangu all. Need inimesed vajaksid erinevat ravi, mistõttu on info sümptomite kohta hädavajalik.
Käesolevas töös arendati välja tehisnärvivõrgumudel, mis ennustab depressiooni sümptomeid teksti alusel. Tulemused näitasid, et sümptomite alusel depressiooni ennustamine andis paremaid tulemusi kui lihtne ainult diagnostilist staatust ennustav klassifitseerija, andes samal ajal ka üksikasjalikumat infot. Ennustustulemusi parandas veelgi meelsus- ja emotsioonileksikonidest pärit välise info mudelisse lisamine. Selleks kasutati lihtsat, kuid tõhusat lähenemist, mis märgib ära leksikonides esinevad sõnad tekstis. Lisaks, töötades sotsiaalmeedia päritolu andmestikuga selgusid probleemid sümptomite märgenduste kvaliteediga. Seetõttu märgendati osa sellest andmestikust uuesti vaimse tervise spetsialisti abiga, näidates sümptomite kliiniliste definitsioonide järgmise ja selgete märgendamisjuhiste rakendamise olulisust.
Major Depressive Disorder (MDD) is one of the most prevalent mental disorders globally, often resulting in disability and an increased risk of suicide. The recent COVID-19 pandemic has made depression rates go up around the world. Moreover, stigma and limited treatment access hinder proper diagnosis and care for many. Early studies have found that depressed and non-depressed people use different vocabulary. For example, depressed people tend to use more negative or emotional words. More recently, deep learning models have been developed to detect depression from text. However, most researchers have treated depression detection as a simple classification task with only two possible labels: depressed and non-depressed. When considering two individuals with depression, it is important to note that they may exhibit different underlying symptoms. One person may experience insomnia and difficulty concentrating, while another may struggle with changes in appetite and low self-esteem. These people would require different treatments, so having information about the symptoms is essential. In this work, we developed a neural network that predicts depression symptoms from text. We found that predicting symptoms instead of a simple diagnosis was more accurate while giving us more details at the same time. We further improved the neural network by introducing external knowledge from existing sentiment and emotion lexicons. We used a simplistic yet effective approach of directly marking the words from the lexicons in the text. Finally, while working with a social media dataset, we discovered it was poorly annotated. As a result, we reannotated a part of this dataset with the help of a mental health professional, showing the importance of following medical symptom definitions and establishing clear annotation guidelines.
Major Depressive Disorder (MDD) is one of the most prevalent mental disorders globally, often resulting in disability and an increased risk of suicide. The recent COVID-19 pandemic has made depression rates go up around the world. Moreover, stigma and limited treatment access hinder proper diagnosis and care for many. Early studies have found that depressed and non-depressed people use different vocabulary. For example, depressed people tend to use more negative or emotional words. More recently, deep learning models have been developed to detect depression from text. However, most researchers have treated depression detection as a simple classification task with only two possible labels: depressed and non-depressed. When considering two individuals with depression, it is important to note that they may exhibit different underlying symptoms. One person may experience insomnia and difficulty concentrating, while another may struggle with changes in appetite and low self-esteem. These people would require different treatments, so having information about the symptoms is essential. In this work, we developed a neural network that predicts depression symptoms from text. We found that predicting symptoms instead of a simple diagnosis was more accurate while giving us more details at the same time. We further improved the neural network by introducing external knowledge from existing sentiment and emotion lexicons. We used a simplistic yet effective approach of directly marking the words from the lexicons in the text. Finally, while working with a social media dataset, we discovered it was poorly annotated. As a result, we reannotated a part of this dataset with the help of a mental health professional, showing the importance of following medical symptom definitions and establishing clear annotation guidelines.
Kirjeldus
Väitekirja elektrooniline versioon ei sisalda publikatsioone
Dissertatsioon on kaitstud Caeni Normandia ülikoolis Prantsusmaal
Dissertatsioon on kaitstud Caeni Normandia ülikoolis Prantsusmaal