[英]TensorFlow reinforcement learning softmax layer
I have a problem with TensorFlow Code.我对 TensorFlow 代码有疑问。 Here is a piece of code that I used in my previous environment - Cart-pole problem
这是我在之前的环境中使用的一段代码 - Cart-pole 问题
initializer = tf.contrib.layers.variance_scaling_initializer()
X = tf.placeholder(tf.float32, shape=[None, n_inputs])
hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)
logits = tf.layers.dense(hidden, n_outputs)
outputs = tf.nn.sigmoid(logits)
p_left_and_right = tf.concat(axis=1, values=[outputs, 1 - outputs])
action = tf.multinomial(tf.log(p_left_and_right), num_samples=1)
y = 1. - tf.to_float(action)
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cross_entropy)
There was two possible discrete decisions (right and left move).有两种可能的离散决策(右移和左移)。
I had decision given by sigmoid layer, which later was randomly selected with probabilities given by that layer.我的决定是由 sigmoid 层给出的,后来它是用该层给出的概率随机选择的。
Now I have environment with three discrete possible decisions, so I tried with softmax layer and it not work.现在我有三个离散的可能决策的环境,所以我尝试了 softmax 层但它不起作用。 When I start TensorFlow session.
当我开始 TensorFlow session。 The code is like that:
代码是这样的:
initializer = tf.contrib.layers.variance_scaling_initializer()
X = tf.placeholder(tf.float32, shape=[None, n_inputs])
hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)
logits = tf.layers.dense(hidden, n_outputs)
outputs = tf.nn.softmax(logits)
p_left_and_right = tf.concat(axis=3, values=[outputs])
action = tf.multinomial(tf.log(p_left_and_right), num_samples=1)
y = 1. - tf.to_float(action)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=logits)
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cross_entropy)
How should I change or improve it, to achieve suitable result and correct/better TensorFlow's code我应该如何更改或改进它,以获得合适的结果和正确/更好的 TensorFlow 代码
I haven't tried to run this myself, but my guess would be to drop the hacks that were introduced to map the Bernouli case to the more general Categorical case.我自己没有尝试过运行它,但我的猜测是将引入到 map 的黑客技术从伯努利案例转移到更一般的分类案例。
To be more specific, I'd try to do this:更具体地说,我会尝试这样做:
initializer = tf.contrib.layers.variance_scaling_initializer()
X = tf.placeholder(tf.float32, shape=[None, n_inputs])
hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)
logits = tf.layers.dense(hidden, n_outputs)
action = tf.multinomial(logits, num_samples=1)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=action, logits=logits)
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cross_entropy)
(I assume you use these grads to build a proper feedback signal that also involves some returns/advantages) (我假设您使用这些毕业生来建立一个适当的反馈信号,其中还涉及一些回报/优势)
The easiest solution of problem is changing the cross entropy function.问题的最简单解决方案是更改交叉熵 function。 I changed it to sparse_softmax_cross_entropy_with_logits, which doesn't need labels in one hot encoding format.
我将其更改为 sparse_softmax_cross_entropy_with_logits,它不需要一种热编码格式的标签。
initializer = tf.contrib.layers.variance_scaling_initializer()
X = tf.placeholder(tf.float32, shape=[None, n_inputs])
hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)
logits = tf.layers.dense(hidden, n_outputs)
action = tf.multinomial(logits, num_samples=1)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels= action[0], logits=logits)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.