简体   繁体   中英

Tensorflow tf.reduce_min how to get the minimal value from certain indexs instead of the whole tensor

I'm trying to learn the DQN using tensorflow. In my action spaces, I have valid and invalid actions for each state. I setup the q_target network as

t1 = tf.layers.dense(s_, 20, tf.nn.relu, w,b, name='t1')
q_next = tf.layers.dense(t1, n_actions, w,b, name='t2')

How can I make it work in tensorflow, such that

q_target = r + self.gamma * max(q_next(valid_actions))

For example:

q_target = [1, 2, 3;4, 5, 6], 
valid_actions = [true,true,false;false,true,false],
output: max(q_next_valid) = [2;5]

Thank you!

Based on your example scenario.

You can try to implement this using tf.math.reduce_max() method.

import tensorflow as tf  # Tensorflow 2.1.0

q_target = [[1, 2, 3],[4, 5, 6]] 
valid_actions = [[True,True,False],[False,True,False]]
valid_actions = tf.cast(valid_actions, dtype = tf.int32)

# output: max(q_next_valid) = [2;5]
tf.math.reduce_max(q_target*valid_actions, axis=1, keepdims=False) # <tf.Tensor: shape=(2,), dtype=int32, numpy=array([2, 5], dtype=int32)>
tf.math.reduce_max(q_target*valid_actions, axis=1, keepdims=True)  # <tf.Tensor: shape=(2, 1), dtype=int32, numpy= array([[2],[5]], dtype=int32)>

I converted your Boolean into Integer so I can multiply it to the q_target that will zero-out wrong values. And also when you set keepdims = True it will retain the original Rank of the Tensor, hence when False it will reduce the Rank of the Tensor by 1.

You can read more about tf.math.reduce_max() in the documentation in this link .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM