简体   繁体   English

Tensorflow Model 低估了带有 Dropout 的值

[英]Tensorflow Model Underpredicts Values with Dropout

I am having a problem implementing dropout as a regularization method in my dense NN model.我在密集的 NN model 中将 dropout 作为正则化方法实施时遇到问题。 It appears that adding a dropout value above 0 just scales down the predicted value, in a way makes me think something is not being accounted for correctly after individual weights are being set to zero.似乎添加大于 0 的 dropout 值只会缩小预测值,在某种程度上让我认为在将单个权重设置为零后没有正确考虑某些事情。 I'm sure I am implementing something incorrectly, but I can't seem to figure out what.我确定我执行不正确,但我似乎无法弄清楚是什么。

The code to build this model was taken directly from a tensorflow page (https://www.tensorflow.org/tutorials/keras/overfit_and_underfit ), but occurs no matter what architecture I use to build the model. The code to build this model was taken directly from a tensorflow page (https://www.tensorflow.org/tutorials/keras/overfit_and_underfit ), but occurs no matter what architecture I use to build the model.

model = tf.keras.Sequential([
        layers.Dense(512, activation='relu', input_shape=[len(X_train[0])]),
        layers.Dropout(0.5),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(1)
    ])

Any help would be much appreciated!任何帮助将非常感激!

在层间使用 0.5 的辍学率时生成的图

It's perfectly normal to decrease accuracy in training set when adding Dropout.添加 Dropout 时降低training set的准确性是完全正常的。 You usually do this as a trade-off to increase accuracy in unseen data (test set) and thus, generalization properties.您通常这样做是为了提高未见数据(测试集)的准确性,从而提高泛化属性。

However, try to decrease Dropout rate to 0.10 or 0.20 .但是,请尝试将 Dropout 率降低到0.100.20 You will get better results.你会得到更好的结果。 Also, unless you are dealing with hundreds of millions of examples, try to decrease the neurons from your neural net, like from 512 to 128. With a complex neural net the backpropagation gradients won't reach an optimum level.此外,除非您处理数亿个示例,否则请尝试减少神经网络中的神经元,例如从 512 减少到 128。对于复杂的神经网络,反向传播梯度不会达到最佳水平。 With a neural net that is too simple, the gradients will saturate and won't learn, either.对于过于简单的神经网络,梯度会饱和,也不会学习。

Other point, you may want to apply pd.get_dummies to your output (Y) and increase last layer to Dense(2) and normalize input data.另一点,您可能希望将pd.get_dummies应用于您的 output (Y) 并将最后一层增加到Dense(2)并规范化输入数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM