Detecting semantically equivalent issue reports using transformer models

dc.contributor.advisorScott, Ezequiel, juhendaja
dc.contributor.authorMoeini, Behrad
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-09-21T08:10:08Z
dc.date.available2023-09-21T08:10:08Z
dc.date.issued2021
dc.description.abstractDevelopers support their software development by creating issue reports that can describe bugs, feature requests, or change requests. As the project grows over time, the number of issue reports also grows in number, and some issues are reported multiple times by different users. To avoiding this issue, several automated approaches have been proposed for retrieving duplicate issue reports. These approaches have been mainly based on information-retrieval techniques. This thesis aims to explore recent advances to detect semantically equivalent text to identify duplicate issue reports. Since several articles are published on this topic, this thesis’s main challenge will be to replicate the existing approaches and compare their performance with the proposed solution. Part of my work is to extract and curate the data from sources such as issue trackers. This thesis will be tackling this as a natural language processing problem and apply advanced techniques to classify whether question pairs are duplicates or not. In this thesis, we take an opensource dataset from GitHub, which many projects have been done on that, so it is easy to compare the result with a different result. We applied different models build a model to detect whether two questions are semantically the same, beginning with simple models and use more complex models step by step. When we applied our model to the dataset that we have and got each model’s result, we take each model their performances and see how are their results.et
dc.identifier.urihttps://hdl.handle.net/10062/92313
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectGithubet
dc.subjectDuplicated questionet
dc.subjectNatural language processinget
dc.subjectTransformer modelet
dc.subjectNeural networket
dc.subject.othermagistritöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleDetecting semantically equivalent issue reports using transformer modelset
dc.typeThesiset

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
moeini_computerscience_2021.pdf
Suurus:
459.61 KB
Formaat:
Adobe Portable Document Format
Kirjeldus:

Litsentsi pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
license.txt
Suurus:
1.71 KB
Formaat:
Item-specific license agreed upon to submission
Kirjeldus: