Klassifitseerija kalibreerituse testi võimsuse suurendamine

dc.contributor.advisorKull, Meelis, juhendaja
dc.contributor.authorValk, Kaspar
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-11-01T14:19:06Z
dc.date.available2023-11-01T14:19:06Z
dc.date.issued2020
dc.description.abstractIn machine learning, a classifier is called to be calibrated if its predicted class probabilities match with the actual class distribution of the data. In classification tasks where safety is necessary, it is important that the classifier’s predictions would not be over- or underconfident but instead would be calibrated. Calibration can be evaluated using the measure ECE, and based on its value it is possible to construct a calibration test: a statistical test which allows to check if the hypothesis that the model is calibrated holds. In the thesis, experiments were performed to find optimal parameters for calculating ECE, so that the calibration test based on this would be as powerful as possible. That is, for a miscalibrated classifier the test would be able to reject the null hypothesis that the model is calibrated as frequently as possible. The work concluded that to make the calibration test as powerful as possible, the datapoints should be placed into separate bins when calculating ECE. If the dataset is expected to contain datapoints for which the classifier is largely miscalibrated, then it is best to use a variant of ECE with the logarithmic distance measure inspired by Kullback-Leibler divergence. Otherwise, it is more reasonable to use absolute or square distance. These recommendations differ significantly from conventional parameter values used when calculating ECE in previous scientific literature. The results of this thesis allow for improved identification of miscalibration in classifiers.et
dc.identifier.urihttps://hdl.handle.net/10062/93923
dc.language.isoestet
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectmachine learninget
dc.subjectclassifier’s calibrationet
dc.subjectexpected calibration erroret
dc.subjectpower of a testet
dc.subject.otherbakalaureusetöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleKlassifitseerija kalibreerituse testi võimsuse suurendamineet
dc.typeThesiset

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Valk_Informaatika_2020.pdf
Suurus:
3.73 MB
Formaat:
Adobe Portable Document Format
Kirjeldus:

Litsentsi pakett

Nüüd näidatakse 1 - 1 1
Pisipilt ei ole saadaval
Nimi:
license.txt
Suurus:
1.71 KB
Formaat:
Item-specific license agreed upon to submission
Kirjeldus: