Klassifitseerija kalibreerituse testi võimsuse suurendamine

Valk, Kaspar

Klassifitseerija kalibreerituse testi võimsuse suurendamine

dc.contributor.advisor	Kull, Meelis, juhendaja
dc.contributor.author	Valk, Kaspar
dc.contributor.other	Tartu Ülikool. Loodus- ja täppisteaduste valdkond	et
dc.contributor.other	Tartu Ülikool. Arvutiteaduse instituut	et
dc.date.accessioned	2023-11-01T14:19:06Z
dc.date.available	2023-11-01T14:19:06Z
dc.date.issued	2020
dc.description.abstract	In machine learning, a classifier is called to be calibrated if its predicted class probabilities match with the actual class distribution of the data. In classification tasks where safety is necessary, it is important that the classifier’s predictions would not be over- or underconfident but instead would be calibrated. Calibration can be evaluated using the measure ECE, and based on its value it is possible to construct a calibration test: a statistical test which allows to check if the hypothesis that the model is calibrated holds. In the thesis, experiments were performed to find optimal parameters for calculating ECE, so that the calibration test based on this would be as powerful as possible. That is, for a miscalibrated classifier the test would be able to reject the null hypothesis that the model is calibrated as frequently as possible. The work concluded that to make the calibration test as powerful as possible, the datapoints should be placed into separate bins when calculating ECE. If the dataset is expected to contain datapoints for which the classifier is largely miscalibrated, then it is best to use a variant of ECE with the logarithmic distance measure inspired by Kullback-Leibler divergence. Otherwise, it is more reasonable to use absolute or square distance. These recommendations differ significantly from conventional parameter values used when calculating ECE in previous scientific literature. The results of this thesis allow for improved identification of miscalibration in classifiers.	et
dc.identifier.uri	https://hdl.handle.net/10062/93923
dc.language.iso	est	et
dc.publisher	Tartu Ülikool	et
dc.rights	openAccess	et
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	machine learning	et
dc.subject	classifier’s calibration	et
dc.subject	expected calibration error	et
dc.subject	power of a test	et
dc.subject.other	bakalaureusetööd	et
dc.subject.other	informaatika	et
dc.subject.other	infotehnoloogia	et
dc.subject.other	informatics	et
dc.subject.other	infotechnology	et
dc.title	Klassifitseerija kalibreerituse testi võimsuse suurendamine	et
dc.type	Thesis	et

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1

Nimi:: Valk_Informaatika_2020.pdf
Suurus:: 3.73 MB
Formaat:: Adobe Portable Document Format
Kirjeldus:

Lae alla

Litsentsi pakett

Nüüd näidatakse 1 - 1 1

Nimi:: license.txt
Suurus:: 1.71 KB
Formaat:: Item-specific license agreed upon to submission
Kirjeldus:

Lae alla

Kollektsioonid

LTAT bakalaureusetööd – Bachelor's theses