In Search of the Best Activation Function

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Tartu Ülikool

Abstract

The choice of an activation function in neural networks can have great consequences on the performance of the network. Designing and discovering new activation functions that increase the performance or solve problems of existing activation functions is an active research field. In this thesis, a kind of trainable activation function is proposed - a weighted linear combination of activation functions where the weights are normalized using Softmax, inspired by the DARTS network architecture search method. The activation function is applied at the layer, kernel, and neuron levels. Optimizing the activation function weights is done on training loss and validation loss, as was done in DARTS. The activation function here was tested on two simple datasets, sine wave, and spiral datasets, on image classification tasks and on a robotics task. In the case of image classification, on CIFAR10 using the trainable activation function for initial training the accuracy increased 5% over the baseline, on ImageNet the accuracy increased 1% over the baseline. For the robotics task, CartPole, the mean reward increased by 10 points out of a maximum of 200 when using the already learned activation functions in the case of Deep Q-learning. In the case of Proximal Policy Optimization, the mean reward increased by 2 points approximately over the baseline. For future work, more difficult tasks could be explored for robotics tasks and longer initial search could be explored for image classification tasks.

Description

Keywords

Activation function, trainable activation function, artificial neural network, image classification, reinforcement learning, robotics, CIFAR10, ImageNet, CartPole

Citation