Instance-based Label Smoothing for Better Classifier Calibration
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Binary classification is one of the fundamental tasks in machine learning,
which involves assigning one of two classes to an instance defined by a set of features.
Although accurate predictions are essential in most of the tasks, knowing the model
confidence is indispensable in many of them. Many probabilistic classifiers’ predictions
are not well-calibrated and tend to be overconfident, requiring further calibration as a
post-processing step to the model training.
Logistic calibration is one of the most popular calibration methods, that fits a logistic
regression model to map the outputs of a classification model into calibrated class
probabilities. Various regularization methods could be applied to logistic regression
fitting to reduce its overfitting on the training set. Platt scaling is one of these methods,
which applies label smoothing to the class labels and transforms them into target probabilities
before fitting the model to reduce its overconfidence. Also, label smoothing is
widely used in classification neural networks. In previous works, it was shown that label
smoothing has a positive calibration and generalization effect on the network predictions.
However, it erases information about the similarity structure of the classes by treating all
incorrect classes as equally probable, which impairs the distillation performance of the
network model.
In this thesis, we aim to find better ways of reducing overconfidence in logistic
regression. Here we derive the formula of a Bayesian approach for the optimal predicted
probabilities in case of knowing the generative model distribution of the dataset.
Later, this formula is approximated by a sampling approach to be applied practically.
Additionally, we propose a new instance-based label smoothing method for logistic
regression fitting. This method motivated us to present a novel label smoothing approach
that enhanced the distillation and calibration performance of neural networks compared
with standard label smoothing.
The evaluation experiments confirmed that the approximated formula for the derived
optimal predictions is significantly outperforming all other regularization methods on
synthetic datasets of known generative model distribution. However, in more realistic
scenarios when this distribution is unknown, our proposed instance-based label smoothing
had a better performance than Platt scaling in most of the synthetic and real-world datasets
in terms of log loss and calibration error. Besides, neural networks trained with instancebased
label smoothing, outperformed the standard label smoothing regarding log loss,
calibration error, and network distillation.
Description
Keywords
Machine Learning, Logistic Regression, Platt Scaling, Label Smoothing, Probabilistic Classifiers, Bayesian Reasoning, Neural Networks