简体   繁体   中英

Keras Activation Functions Tanh Vs Sigmoid

I have an LSTM that utilizes binary data, ie the labels are all 0's or 1's.

This would lead me to use a sigmoid activation function, but when I do it significantly underperforms the same model with a tanh activation function with the same data.

Why would a tanh activation function produce a better accuracy even though the data is not in the (-1,1) range needed for a tanh activation function?

Sigmoid Activation Function Accuracy: Training-Accuracy: 60.32 % Validation-Accuracy: 72.98 %

Tanh Activation Function Accuracy: Training-Accuracy: 83.41 % Validation-Accuracy: 82.82 %

All the rest of the code is the exact same.

Thanks.

Convergence is usually faster if the average of each input variable over the training set is close to zero. And tanh has a zero mean. It's likely your data is normalized and has a mean near zero?

Ref: https://medium.com/analytics-vidhya/activation-functions-why-tanh-outperforms-logistic-sigmoid-3f26469ac0d1

In the interval of (0, 1] if gradient is diminishing over time t, Then sigmoid gives better result. If gradient is increasing then tanh activation function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM