简体   繁体   English

Tensorflow 损失已经很低

[英]Tensorflow loss is already low

I'm doing an AI with reinforcement learning and i'm getting weird results, the loss shows like this: Tensorflow loss: https://imgur.com/a/Twacm我正在做一个带有强化学习的 AI,我得到了奇怪的结果,损失显示如下:Tensorflow 损失: https ://imgur.com/a/Twacm

And while it's training, after each game, it's playing against a random player and after a player with a weighted matrix, but it goes up and down: results: https://imgur.com/a/iGuu2当它在训练时,在每场比赛之后,它会与一个随机玩家对战,并在一个带有加权矩阵的玩家之后进行比赛,但它会上下波动:结果: https : //imgur.com/a/iGuu2

Basically i'm doing a reinforcement learning agent that learns to play Othello.基本上,我正在做一个学习玩黑白棋的强化学习代理。 Using E-greedy, Experience replay and deep networks using Keras over Tensorflow.使用 E-greedy,在 Tensorflow 上使用 Keras 体验重放和深度网络。 Tried different architectures like sigmoid, relu and in the images shown above, tanh.尝试了不同的架构,如 sigmoid、relu 和上图所示的 tanh。 All them have similar loss but the results are a bit different.他们都有类似的损失,但结果有点不同。 In this exemple the agent is learning from 100k professional games.在这个例子中,代理从 10 万场专业游戏中学习。 Here is the architecture, with default learning rate as 0.005:这是架构,默认学习率为 0.005:

model.add(Dense(units=200,activation='tanh',input_shape=(64,)))
model.add(Dense(units=150,activation='tanh'))
model.add(Dense(units=100,activation='tanh'))
model.add(Dense(units=64,activation='tanh'))
optimizer = Adam(lr=lr, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss=LOSS,optimizer=optimizer)

Original code: https://github.com/JordiMD92/thellia/tree/keras原代码: https : //github.com/JordiMD92/thellia/tree/keras

So, why i get these results?那么,为什么我会得到这些结果? Now my input is 64 neurons (8*8 matrix), with 0 void square, 1 black square and -1 white square.现在我的输入是 64 个神经元(8*8 矩阵),其中 0 个空白方块、1 个黑色方块和 -1 个白色方块。 Is it bad to use negative inputs?使用负输入是不是很糟糕?

It might be your activate function's problem.这可能是您的激活功能的问题。 Try to use relu instead of tanh, and if you are using the deep q learning, you might dont need any activate function or take care about the optimizer which reset the weights.尝试使用 relu 而不是 tanh,如果您使用的是深度 q 学习,您可能不需要任何激活函数或关心重置权重的优化器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM