简体   繁体   English

TensorFlow 强化学习softmax层

[英]TensorFlow reinforcement learning softmax layer

I have a problem with TensorFlow Code.我对 TensorFlow 代码有疑问。 Here is a piece of code that I used in my previous environment - Cart-pole problem这是我在之前的环境中使用的一段代码 - Cart-pole 问题

initializer = tf.contrib.layers.variance_scaling_initializer()

X = tf.placeholder(tf.float32, shape=[None, n_inputs])

hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)
logits = tf.layers.dense(hidden, n_outputs)
outputs = tf.nn.sigmoid(logits)  

p_left_and_right = tf.concat(axis=1, values=[outputs, 1 - outputs])
action = tf.multinomial(tf.log(p_left_and_right), num_samples=1)

y = 1. - tf.to_float(action)

cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cross_entropy)

There was two possible discrete decisions (right and left move).有两种可能的离散决策(右移和左移)。

I had decision given by sigmoid layer, which later was randomly selected with probabilities given by that layer.我的决定是由 sigmoid 层给出的,后来它是用该层给出的概率随机选择的。

Now I have environment with three discrete possible decisions, so I tried with softmax layer and it not work.现在我有三个离散的可能决策的环境,所以我尝试了 softmax 层但它不起作用。 When I start TensorFlow session.当我开始 TensorFlow session。 The code is like that:代码是这样的:

initializer = tf.contrib.layers.variance_scaling_initializer()

X = tf.placeholder(tf.float32, shape=[None, n_inputs])

hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)

logits = tf.layers.dense(hidden, n_outputs)

outputs = tf.nn.softmax(logits)  

p_left_and_right = tf.concat(axis=3, values=[outputs])
action = tf.multinomial(tf.log(p_left_and_right), num_samples=1)

y = 1. - tf.to_float(action)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=logits)
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cross_entropy)

How should I change or improve it, to achieve suitable result and correct/better TensorFlow's code我应该如何更改或改进它,以获得合适的结果和正确/更好的 TensorFlow 代码

I haven't tried to run this myself, but my guess would be to drop the hacks that were introduced to map the Bernouli case to the more general Categorical case.我自己没有尝试过运行它,但我的猜测是将引入到 map 的黑客技术从伯努利案例转移到更一般的分类案例。

To be more specific, I'd try to do this:更具体地说,我会尝试这样做:


initializer = tf.contrib.layers.variance_scaling_initializer()

X = tf.placeholder(tf.float32, shape=[None, n_inputs])

hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)

logits = tf.layers.dense(hidden, n_outputs)

action = tf.multinomial(logits, num_samples=1)

cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=action, logits=logits)
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cross_entropy)

(I assume you use these grads to build a proper feedback signal that also involves some returns/advantages) (我假设您使用这些毕业生来建立一个适当的反馈信号,其中还涉及一些回报/优势)

The easiest solution of problem is changing the cross entropy function.问题的最简单解决方案是更改交叉熵 function。 I changed it to sparse_softmax_cross_entropy_with_logits, which doesn't need labels in one hot encoding format.我将其更改为 sparse_softmax_cross_entropy_with_logits,它不需要一种热编码格式的标签。

initializer = tf.contrib.layers.variance_scaling_initializer()

X = tf.placeholder(tf.float32, shape=[None, n_inputs])

hidden = tf.layers.dense(X, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer)

logits = tf.layers.dense(hidden, n_outputs)

action = tf.multinomial(logits, num_samples=1)


cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels= action[0], logits=logits)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 没有 Gym 的 Tensorflow 上的强化学习 - Reinforcement Learning on Tensorflow without Gym Tensorflow:如何将 conv 层权重复制到另一个变量以用于强化学习? - Tensorflow: How to copy conv layer weights to another variable for use in reinforcement learning? Tensorflow CNN-密集层作为Softmax层输入 - Tensorflow CNN - Dense layer as Softmax layer input 强化学习(使用 Tensorflow 和 Matlab 环境) - Reinforcement Learning (Using Tensorflow and a Matlab Env) 从[tensorflow 1.00]中的softmax层中提取概率 - Extracting probabilities from a softmax layer in [tensorflow 1.00] 如何在 Tensorflow 中用 Logistic 层替换 Softmax 输出层? - How to replace Softmax ouput Layer with Logistic Layer in Tensorflow? Tensorflow 强化学习 RNN 在使用 GradientTape 优化后返回 NaN - Tensorflow Reinforcement Learning RNN returning NaN's after Optimization with GradientTape 使用稀疏张量为TensorFlow中的softmax图层提供占位符 - Using Sparse Tensors to feed a placeholder for a softmax layer in TensorFlow 如何确保数组在tensorflow python中的softmax层具有适当的维度 - How to ensure arrays have proper dimensions for softmax layer in tensorflow python 如何在softmax层之后将Tensorflow CNN读取为整数? - How to readout a Tensorflow CNN into integers after the softmax layer?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM