Arvutiteaduse instituut
Selle valdkonna püsiv URIhttps://hdl.handle.net/10062/14970
Sirvi
Sirvi Arvutiteaduse instituut Autor "Aavola, Heili" järgi
Nüüd näidatakse 1 - 1 1
- Tulemused lehekülje kohta
- Sorteerimisvalikud
listelement.badge.dso-type Kirje , In-Depth Analysis of Miscalibration In Binary Classification(Tartu Ülikool, 2025) Aavola, Heili; Allikivi, Mari-Liis, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituutReliable probability estimates from binary classifiers are crucial for decisionmaking. While standard evaluation metrics provide an overall assessment of calibration quality, a deeper examination of miscalibration patterns can offer further insights into how calibration methods perform. This thesis presents an in-depth analysis of miscalibration patterns for five post-hoc calibration methods: Isotonic Calibration, Logistic Calibration, Beta Calibration, Histogram Binning, and Simplified Venn-Abers. Using a synthetic data framework with five diverse, known true calibration maps, we performed 100 simulation runs for each method-map combination. A suite of five specialized characterization plots was employed to visualize and understand nuanced error profiles, including accuracy, bias, variance, and directional tendencies in misestimation. The results reveal distinct behavioral characteristics and trade-offs. Parametric methods (Logistic, Beta) exhibited high stability but incurred significant systematic bias when their functional assumptions did not match the true probability landscape. Non-parametric methods (Isotonic, SVA) demonstrated superior adaptability and lower average error but with step-like outputs and slightly higher variance in complex regions. Histogram Binning showed considerable artifacts tied to its fixed-bin structure. The characterization plots successfully highlighted consistent directional biases and other nuanced error patterns not evident from aggregate metrics. This granular understanding reveals the precise behavior of different calibration methods, offering a more nuanced basis for selecting approaches tailored to specific application needs and risk sensitivities, particularly in complex or risk-sensitive contexts, moving beyond single performance scores.