简体繁体 English

Keras 激活函数 Tanh 与 Sigmoid

[英]Keras Activation Functions Tanh Vs Sigmoid

原文 2020-08-16 14:41:42 0 2 python/ tensorflow/ keras

I have an LSTM that utilizes binary data, ie the labels are all 0's or 1's.我有一个使用二进制数据的 LSTM，即标签都是 0 或 1。

This would lead me to use a sigmoid activation function, but when I do it significantly underperforms the same model with a tanh activation function with the same data.这将导致我使用 sigmoid 激活 function，但是当我这样做时，它的性能明显低于具有相同数据的 tanh 激活 ZC1C425268E68385D1AB5074C17A94F 的相同 model。

Why would a tanh activation function produce a better accuracy even though the data is not in the (-1,1) range needed for a tanh activation function?为什么即使数据不在 tanh 激活 function 所需的 (-1,1) 范围内，tanh 激活 function 也会产生更好的精度？

Sigmoid Activation Function Accuracy: Training-Accuracy: 60.32 % Validation-Accuracy: 72.98 % Sigmoid 激活 Function 准确度：训练准确度：60.32 % 验证准确度：72.98 %

Tanh Activation Function Accuracy: Training-Accuracy: 83.41 % Validation-Accuracy: 82.82 % Tanh 激活 Function 准确度：训练准确度：83.41 % 验证准确度：82.82 %

All the rest of the code is the exact same.所有 rest 的代码完全相同。

Thanks.谢谢。

2 个解决方案

Convergence is usually faster if the average of each input variable over the training set is close to zero.如果训练集上每个输入变量的平均值接近于零，收敛速度通常会更快。 And tanh has a zero mean. tanh 的均值为零。 It's likely your data is normalized and has a mean near zero?您的数据可能已标准化并且平均值接近于零？

Ref: https://medium.com/analytics-vidhya/activation-functions-why-tanh-outperforms-logistic-sigmoid-3f26469ac0d1参考： https://medium.com/analytics-vidhya/activation-functions-why-tanh-outperforms-logistic-sigmoid-3f26469ac0d1

In the interval of (0, 1] if gradient is diminishing over time t, Then sigmoid gives better result. If gradient is increasing then tanh activation function.在 (0, 1] 区间内，如果梯度随时间 t 减小，则 sigmoid 给出更好的结果。如果梯度增加，则 tanh 激活 function。