Detecting semantically equivalent issue reports using transformer models
Laen...
Kuupäev
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu Ülikool
Abstrakt
Developers support their software development by creating issue reports that can describe
bugs, feature requests, or change requests. As the project grows over time, the number
of issue reports also grows in number, and some issues are reported multiple times by
different users. To avoiding this issue, several automated approaches have been proposed
for retrieving duplicate issue reports. These approaches have been mainly based on
information-retrieval techniques. This thesis aims to explore recent advances to detect
semantically equivalent text to identify duplicate issue reports. Since several articles
are published on this topic, this thesis’s main challenge will be to replicate the existing
approaches and compare their performance with the proposed solution. Part of my work
is to extract and curate the data from sources such as issue trackers. This thesis will be
tackling this as a natural language processing problem and apply advanced techniques
to classify whether question pairs are duplicates or not. In this thesis, we take an opensource
dataset from GitHub, which many projects have been done on that, so it is easy to
compare the result with a different result. We applied different models build a model to
detect whether two questions are semantically the same, beginning with simple models
and use more complex models step by step. When we applied our model to the dataset
that we have and got each model’s result, we take each model their performances and see
how are their results.
Kirjeldus
Märksõnad
Github, Duplicated question, Natural language processing, Transformer model, Neural network