Disentanglement of features in variational autoencoders

Kuupäev

2022

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Tartu Ülikool

Abstrakt

Machine learning models, especially neural networks, have shown excellent performance in classifying different images. The features these models learn are often complex and hard to interpret. Learning disentangled features from images is a way to tackle explainability and create features with semantic meaning. A learned feature is disentangled if it represents only a single property of an object. For example, if we had an image of a chair, we would assume that one feature changes its size, but nothing else. Another feature changes the chair leg shape and nothing else. Beta variational autoencoders (β-VAE) have shown promising performance in learning disentangled features from images without supervision. If there is enough data, the model can learn the features without needing large amounts of labelled data. After learning features, we can use a smaller amount of labelled data to train an additional model on top of the learned features (few-shot learning). The experiments of β-VAE architectures have been with simple images with known generative factors. Usually, all generative factors are independent, and the architecture assumes that there is a small number of them. Recently a new dataset has been published where some features are dependent (Boxhead dataset). The experiments with existing architectures showed relatively poor performance on β-VAE based architectures to capture those features. Based on exploratory analysis of β-VAE architecture based models, we propose a new architecture to improve the result. For evaluation, we introduce new metrics in addition to the commonly used ones. Our results showed no substantial performance difference between our proposed and β-VAE architectures. Based on the results of the main experiments, we conduct additional exploratory experiments on a dataset where the object does not rotate.

Kirjeldus

Märksõnad

machine learning, variational autoencoder, unsupervised learning, image processing, disentanglement

Viide