TanhSoft—Dynamic Trainable Activation Functions for Faster Learning and Better Performance

oleh: Koushik Biswas, Sandeep Kumar, Shilpak Banerjee, Ashish Kumar Pandey

Format: Article
Diterbitkan: IEEE 2021-01-01

Deskripsi

Deep learning, at its core, contains functions that are the composition of a linear transformation with a nonlinear function known as the activation function. In the past few years, there is an increasing interest in the construction of novel activation functions resulting in better learning. In this work, we propose three novel activation functions with learnable parameters, namely TanhSoft-1, TanhSoft-2, and TanhSoft-3, which are shown to outperform several well-known activation functions. For instance, replacing ReLU with TanhSoft-1, TanhSoft-2, and Tanhsot-3 improves top-1 classification accuracy by 6.06%, 5.75%, and 5.38% respectively on VGG-16(with batch-normalization), by 3.02%, 3.25% and 2.93% respectively on PreActResNet-34 in CIFAR-100 dataset, by 1.76%, 1.93%, and 1.82% respectively on WideResNet 28-10 in Tiny ImageNet dataset. TanhSoft-1, TanhSoft-2, and Tanhsot-3 outperformed ReLU on mean average precision (mAP) by 0.7%, 0.8%, and 0.6% respectively in object detection problem on SSD 300 model in Pascal VOC dataset.