In Search of the Best Activation Function
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
The choice of an activation function in neural networks can have great
consequences on the performance of the network. Designing and discovering new
activation functions that increase the performance or solve problems of existing activation
functions is an active research field. In this thesis, a kind of trainable activation function
is proposed - a weighted linear combination of activation functions where the weights are
normalized using Softmax, inspired by the DARTS network architecture search method.
The activation function is applied at the layer, kernel, and neuron levels. Optimizing the
activation function weights is done on training loss and validation loss, as was done in
DARTS. The activation function here was tested on two simple datasets, sine wave, and
spiral datasets, on image classification tasks and on a robotics task. In the case of image
classification, on CIFAR10 using the trainable activation function for initial training the
accuracy increased 5% over the baseline, on ImageNet the accuracy increased 1% over
the baseline. For the robotics task, CartPole, the mean reward increased by 10 points
out of a maximum of 200 when using the already learned activation functions in the
case of Deep Q-learning. In the case of Proximal Policy Optimization, the mean reward
increased by 2 points approximately over the baseline. For future work, more difficult
tasks could be explored for robotics tasks and longer initial search could be explored for
image classification tasks.
Description
Keywords
Activation function, trainable activation function, artificial neural network, image classification, reinforcement learning, robotics, CIFAR10, ImageNet, CartPole