简体   繁体   English

Keras 激活函数 Tanh 与 Sigmoid

[英]Keras Activation Functions Tanh Vs Sigmoid

I have an LSTM that utilizes binary data, ie the labels are all 0's or 1's.我有一个使用二进制数据的 LSTM,即标签都是 0 或 1。

This would lead me to use a sigmoid activation function, but when I do it significantly underperforms the same model with a tanh activation function with the same data.这将导致我使用 sigmoid 激活 function,但是当我这样做时,它的性能明显低于具有相同数据的 tanh 激活 ZC1C425268E68385D1AB5074C17A94F 的相同 model。

Why would a tanh activation function produce a better accuracy even though the data is not in the (-1,1) range needed for a tanh activation function?为什么即使数据不在 tanh 激活 function 所需的 (-1,1) 范围内,tanh 激活 function 也会产生更好的精度?

Sigmoid Activation Function Accuracy: Training-Accuracy: 60.32 % Validation-Accuracy: 72.98 % Sigmoid 激活 Function 准确度:训练准确度:60.32 % 验证准确度:72.98 %

Tanh Activation Function Accuracy: Training-Accuracy: 83.41 % Validation-Accuracy: 82.82 % Tanh 激活 Function 准确度:训练准确度:83.41 % 验证准确度:82.82 %

All the rest of the code is the exact same.所有 rest 的代码完全相同。

Thanks.谢谢。

Convergence is usually faster if the average of each input variable over the training set is close to zero.如果训练集上每个输入变量的平均值接近于零,收敛速度通常会更快。 And tanh has a zero mean. tanh 的均值为零。 It's likely your data is normalized and has a mean near zero?您的数据可能已标准化并且平均值接近于零?

Ref: https://medium.com/analytics-vidhya/activation-functions-why-tanh-outperforms-logistic-sigmoid-3f26469ac0d1参考: https://medium.com/analytics-vidhya/activation-functions-why-tanh-outperforms-logistic-sigmoid-3f26469ac0d1

In the interval of (0, 1] if gradient is diminishing over time t, Then sigmoid gives better result. If gradient is increasing then tanh activation function.在 (0, 1] 区间内,如果梯度随时间 t 减小,则 sigmoid 给出更好的结果。如果梯度增加,则 tanh 激活 function。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM