Exploring Out-of-Distribution Detection Using Vision Transformers

dc.contributor.advisorKull, Meelis, juhendaja
dc.contributor.advisorLeelar, Bhawani Shankar, juhendaja
dc.contributor.authorHaavel, Karl Kaspar
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-08-24T06:12:48Z
dc.date.available2023-08-24T06:12:48Z
dc.date.issued2022
dc.description.abstractCurrent state-of-the-art artificial neural network (ANN) image classifiers perform well on input data from the same distribution that it was trained with, also known as in-distribution (InD), yet have worse results on out-of-distribution (OOD) samples. An input can be considered OOD for many reasons - such as an input with a new concept (e.g. new class), or the input has random noise generated by a sensor. Knowing if a new data point is OOD is necessary for deploying models in real-world safety-critical applications (e.g. self-driving cars, healthcare) to make safer decisions. For example, a self-driving car slows down when it detects an OOD object or gives the control back to the human. The primary method used for OOD detection is to utilise ANN as a feature extractor of embeddings to approximate where the new data point will be in the embedding space and compare it to trained embeddings using distance metrics. We use a Vision Transformer (ViT) as the ANN because there has been a push to use large-scale pre-trained Transformers to improve a range of OOD tasks. Improvements stem from ViT’s state-of-the-art performance as a feature extractor, which can be used out-of-the-box for OOD detection compared to convolutional neural networks (CNNs), which require custom training methods and exposure to OOD to reach similar results. In this thesis, a ViT was used as a feature extractor, and the performance of OOD detection was compared using various distance metrics to determine the robustness and choose the best distance metric in ViT’s embedding space. Three separate experiments were conducted with multiple datasets, methods, models and approaches. The experiments showed that ViT is capable of OOD detection out-of-the-box without any custom training methods or exposure to OOD. However, none of the distance metrics could noticeably improve the results of OOD detection obtained with the baseline Mahalanobis distance. Nonetheless, ViT has considerably better OOD detection performance in most datasets and is more generalisable than a similarly trained CNN. Furthermore, ViT is more robust to various distance metrics, proving that the features extracted from the model are good enough to discriminate between InD and OOD. Finally, it was shown that ViT with Mahalanobis distance has the best OOD detection performance when blending InD and OOD at various ratios. Future work can consider ensembling multiple distance metrics to utilise the properties of each distance metric and to apply the same methodology on other ANN architectures.et
dc.identifier.urihttps://hdl.handle.net/10062/91717
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectdeep learninget
dc.subjectneural networkset
dc.subjectvision transformeret
dc.subjectout-of-distribution detectionet
dc.subject.othermagistritöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleExploring Out-of-Distribution Detection Using Vision Transformerset
dc.typeThesiset

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
haavel_datascience_2022.pdf
Size:
2.51 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: