Tensorflow-如何使用策略梯度计算损失

Question

因此，我想计算损失，将模型的预测与验证输出进行比较。

我的代码：

def _build_net(self):
   self.n_actions = 3
   with tf.name_scope('inputs'):
   self.tf_obs = tf.placeholder(tf.float32, shape=(None, MAX_NUM, NUM_FEATURES), name="observations")

   self.tf_acts = tf.placeholder(tf.int32, shape=(None,),
                                  name="actions_num")

   self.tf_vt = tf.placeholder(tf.float32, shape=(None,),
                                name="actions_value")

  flattened_frames = tf.reshape(self.tf_obs, [-1, NUM_FEATURES])
  init_layers = tf.random_normal_initializer(mean=0, stddev=0.3)

  # fc1
  f1_layer = tf.layers.dense(
      inputs=flattened_frames,
      units=12,
      activation=tf.nn.tanh,  # tanh activation
      kernel_initializer=init_layers,
      bias_initializer=tf.constant_initializer(0.1),
      name='fc1'
  )
  # fc2
  f2_layer = tf.layers.dense(
      inputs=f1_layer,
      units=6,
      activation=tf.nn.tanh,  # tanh activation
      kernel_initializer=init_layers,
      bias_initializer=tf.constant_initializer(0.1),
      name='fc2'
  )
  # fc3
  all_act = tf.layers.dense(
      inputs=f2_layer,
      units=self.n_actions,
      activation=None,
      kernel_initializer=init_layers,
      bias_initializer=tf.constant_initializer(0.1),
      name='fc3'
  )

  logits = tf.reshape(all_act, [-1, MAX_NUM])
  self.all_act_prob = tf.nn.softmax(logits, name='act_prob')  


  with tf.name_scope('loss'):

      neg_log_prob = tf.nn.sparse_softmax_cross_entropy_with_logits(
          logits=all_act,
          labels=self.tf_acts
      )

      self._loss = tf.reduce_mean(neg_log_prob * self.tf_vt) 

  with tf.name_scope('train'):
      self.train_op = tf.train.AdamOptimizer(self.lr).minimize(self._loss)

我计算损失的方式：

def compute_loss(self, input_data, expected_output_data):
        """
        Compute loss on the input data.

        :param input_data: numpy array of shape (number of frames, MAX_NUM, NUM_FEATURES)
        :param expected_output_data: numpy array of shape (number of frames, MAX_NUM)
        :return: training loss on the input data
        """
        return self._session.run(self._loss,
                                 feed_dict={self.tf_obs: input_data,
                                            self._target_distribution: expected_output_data})

问题：_build_net有效，但是当我运行compute_loss时，出现此错误：

您必须使用dtype float和shape [？]输入占位符张量'inputs / actions_value'的值

[[节点：inputs / actions_value = Placeholderdtype = DT_FLOAT，shape = [？]，_ device =“ / job：localhost /副本：0 / task：0 / cpu：0”]]

现在我知道我需要为self.tf_acts和self.tf_vt输入一些self.tf_vt ，但是如果我不知道它们的值怎么办？ 解决方法是什么？

另外，这是为强化学习模型计算损失（用于验证输入/输出）的正确方法吗？

Answer 1

从TensorFlow角度来看，没有解决方法来为占位符指定值。 你不能要求它来计算a + b没有给出一个值a 。

Tensorflow-如何使用策略梯度计算损失

问题描述

1 个解决方案

解决方案1
0 2018-08-24 19:36:59

Tensorflow-如何使用策略梯度计算损失

问题描述

1 个解决方案

解决方案1 0 2018-08-24 19:36:59

解决方案1
0 2018-08-24 19:36:59