[英]How do you create Deep Q-Learning neural network to solve simple games like snake?
I have been working for the last four days to try and create a simple working neural network(NN) that learns. 最近四天我一直在努力尝试创建一个简单的可学习的神经网络(NN)。 I started off with the tower of Hanoi but that was quite tricky (doable with a Q-table)and no-one has really got good examples online so I decided to do it instead for snake game where there are lots of examples and tutorials. 我从河内塔楼开始,但是那很棘手(可以通过Q表完成),没有人在网上真的有很好的示例,因此我决定改为在蛇游戏中使用它,因为那里有很多示例和教程。 Long story short i have made a new super simple game where you have [0,0,0,0] and by picking 0, 1, 2, or 3 you change a 0 to a 1 or vice versa. 长话短说,我做了一个新的超级简单的游戏,您有[0,0,0,0],通过选择0、1、2或3,您可以将0更改为1,反之亦然。 So picking 1 would give an output of [0,1,0,0] and picking 1 again goes back to [0,0,0,0]. 因此,选择1将给出[0,1,0,0]的输出,而再次选择1将返回到[0,0,0,0]。 Very easy 很容易
Despite the game being very easy I'm very much struggling to go from concept to practical as I have no education in coding. 尽管游戏非常简单,但由于我没有编码方面的知识,所以我仍然很难从概念到实际。
The final goal right now is to get the code below to be able to complete the game more than once. 现在的最终目标是获得下面的代码,以便能够多次完成游戏。 (it currently has run about 600 times and not once completed the 4 step problem) (它目前已经运行了约600次,并且没有一次完成4步问题)
Current network architecture is 4 inputs 4 nodes in 1st hidden layer and 4 outputs and I would like to keep it this way even if the hidden layer is redundant just so I can learn how to do it correctly for other problems. 当前的网络体系结构是第一个隐藏层中的4个输入4个节点和4个输出,即使隐藏层是冗余的,我也希望保持这种方式,以便我可以学习如何正确处理其他问题。
If you cant be bothered to read the code and I don't blame you, ill put my mental psudocode here: 如果您不愿意阅读代码,但我不怪您,请把我的心理伪代码放在这里:
import tensorflow as tf ## importing libraries
import random
import numpy as np
epsilon = 0.1 ## create non tf variables
y = 0.4
memory = []
memory1 = []
input_ = tf.placeholder(tf.float32, [None, 4], name='input_')
W1 = tf.Variable(tf.random_normal([4, 4], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([4]), name='b1')
hidden_out = tf.add(tf.matmul(input_, W1), b1, name='hidden_out') ## W for weights
hidden_out = tf.nn.relu(hidden_out) ## b for bias'
W2 = tf.Variable(tf.random_normal([4, 4], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([4]), name='b2')
Qout = tf.add(tf.matmul(hidden_out, W2), b2, name='Qout')
sig_out = tf.sigmoid(Qout, name='out')
Q_target = tf.placeholder(shape=(None,4), dtype=tf.float32)
loss = tf.reduce_sum(tf.square(Q_target - Qout))
optimiser = tf.train.GradientDescentOptimizer(learning_rate=y).minimize(loss)
init_op = tf.global_variables_initializer()
with tf.compat.v1.Session() as sess:
sess.run(init_op)
for epoch in range(200): ## run game 200 times
states = [0,0,0,0]
for _ in range(20): ## 20 turns to do the correct 4 moves
if _ == 19:
memory1.append(states)
output = np.argmax(sess.run(sig_out, feed_dict={input_: [states]}))
## sig_out is the output put through a sigmoid function
if random.random() < epsilon: ## this is the code for the game
output = random.randint(0,3) ## ...
if states[output] == 0: ## ...
states[output] = 1 ## ...
else: ## ...
states[output] = 0 ## ...
reward = states
Qout1 = sess.run(sig_out, feed_dict={input_: [states]})
target = [reward + y*np.max(Qout1)]
sess.run([optimiser,loss], feed_dict={input_: [states], Q_target: target})
I haven't got any error messages in a while with this, the actual result would ideally be [1,1,1,1] every time. 我有一段时间没有收到任何错误消息了,理想情况下,每次的实际结果都是[1,1,1,1]。
Thanks in advance for all of your help 预先感谢您的所有帮助
ps i couldn't think of an objective title for this, sorry ps我没想到这个客观的称呼,对不起
The reward
value should be the objective value after an action has been taken. reward
值应该是采取行动后的目标值。 In your case, you have set reward=states
. 在您的情况下,您设置了reward=states
。 Since your function is attempting to maximize reward, the closer to [1, 1, 1, 1] your state gets, the more reward your AI should receive. 由于您的功能正在尝试最大化回报,因此您的状态越接近[1,1,1,1],您的AI应获得的奖励就越多。
Perhaps a reward function such as reward = sum(states)
will solve your problem. 也许诸如reward = sum(states)
类的奖励函数将解决您的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.