Tensorflow 损失已经很低

Question

I'm doing an AI with reinforcement learning and i'm getting weird results, the loss shows like this: Tensorflow loss: https://imgur.com/a/Twacm我正在做一个带有强化学习的 AI，我得到了奇怪的结果，损失显示如下：Tensorflow 损失： https ://imgur.com/a/Twacm

And while it's training, after each game, it's playing against a random player and after a player with a weighted matrix, but it goes up and down: results: https://imgur.com/a/iGuu2当它在训练时，在每场比赛之后，它会与一个随机玩家对战，并在一个带有加权矩阵的玩家之后进行比赛，但它会上下波动：结果： https : //imgur.com/a/iGuu2

Basically i'm doing a reinforcement learning agent that learns to play Othello.基本上，我正在做一个学习玩黑白棋的强化学习代理。 Using E-greedy, Experience replay and deep networks using Keras over Tensorflow.使用 E-greedy，在 Tensorflow 上使用 Keras 体验重放和深度网络。 Tried different architectures like sigmoid, relu and in the images shown above, tanh.尝试了不同的架构，如 sigmoid、relu 和上图所示的 tanh。 All them have similar loss but the results are a bit different.他们都有类似的损失，但结果有点不同。 In this exemple the agent is learning from 100k professional games.在这个例子中，代理从 10 万场专业游戏中学习。 Here is the architecture, with default learning rate as 0.005:这是架构，默认学习率为 0.005：

model.add(Dense(units=200,activation='tanh',input_shape=(64,)))
model.add(Dense(units=150,activation='tanh'))
model.add(Dense(units=100,activation='tanh'))
model.add(Dense(units=64,activation='tanh'))
optimizer = Adam(lr=lr, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss=LOSS,optimizer=optimizer)

Original code: https://github.com/JordiMD92/thellia/tree/keras原代码： https : //github.com/JordiMD92/thellia/tree/keras

So, why i get these results?那么，为什么我会得到这些结果？ Now my input is 64 neurons (8*8 matrix), with 0 void square, 1 black square and -1 white square.现在我的输入是 64 个神经元（8*8 矩阵），其中 0 个空白方块、1 个黑色方块和 -1 个白色方块。 Is it bad to use negative inputs?使用负输入是不是很糟糕？

Answer 1

It might be your activate function's problem.这可能是您的激活功能的问题。 Try to use relu instead of tanh, and if you are using the deep q learning, you might dont need any activate function or take care about the optimizer which reset the weights.尝试使用 relu 而不是 tanh，如果您使用的是深度 q 学习，您可能不需要任何激活函数或关心重置权重的优化器。

Tensorflow 损失已经很低

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-12-26 13:44:19

Tensorflow 损失已经很低

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-12-26 13:44:19

解决方案1
1 已采纳 2017-12-26 13:44:19