[英]DQN understanding input and output (layer)
I have a question about the input and output (layer) of a DQN.我对 DQN 的输入和 output(层)有疑问。
eg例如
Two points: P1(x1, y1) and P2(x2, y2)两点:P1(x1, y1) 和 P2(x2, y2)
P1 has to walk towards P2 P1 必须走向 P2
I have the following information:我有以下信息:
P1 has 4 possible actions: P1 有 4 个可能的操作:
How do I have to setup the input and output layer?如何设置输入和 output 层?
Is that correct?那是对的吗? What do I have to do with the output?
我与 output 有什么关系? I got 4 arrays with 4 values each as output.
我得到了 4 个 arrays,每个有 4 个值作为 output。 Is doing argmax on the output correct?
在 output 上做 argmax 是否正确?
Edit:编辑:
Input / State:输入/State:
# Current position P1
state_pos = [x_POS, y_POS]
state_pos = np.asarray(state_pos, dtype=np.float32)
# Current position P2
state_wp = [wp_x, wp_y]
state_wp = np.asarray(state_wp, dtype=np.float32)
# Distance P1 - P2
state_dist_wp = [wp_x - x_POS, wp_y - y_POS]
state_dist_wp = np.asarray(state_dist_wp, dtype=np.float32)
# Direction P1 - P2
distance = [wp_x - x_POS, wp_y - y_POS]
norm = math.sqrt(distance[0] ** 2 + distance[1] ** 2)
state_direction_wp = [distance[0] / norm, distance[1] / norm]
state_direction_wp = np.asarray(state_direction_wp, dtype=np.float32)
state = [state_pos, state_wp, state_dist_wp, state_direction_wp]
state = np.array(state)
Network:网络:
def __init__(self):
self.q_net = self._build_dqn_model()
self.epsilon = 1
def _build_dqn_model(self):
q_net = Sequential()
q_net.add(Dense(4, input_shape=(4,2), activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(4, activation='linear', kernel_initializer='he_uniform'))
rms = tf.optimizers.RMSprop(lr = 1e-4)
q_net.compile(optimizer=rms, loss='mse')
return q_net
def random_policy(self, state):
return np.random.randint(0, 4)
def collect_policy(self, state):
if np.random.random() < self.epsilon:
return self.random_policy(state)
return self.policy(state)
def policy(self, state):
# Here I get 4 arrays with 4 values each as output
action_q = self.q_net(state)
Adding input_shape=(4,2)
in the first Dense layer is causing the output shape to be (None, 4, 4)
.在第一个 Dense 层中添加
input_shape=(4,2)
导致 output 形状为(None, 4, 4)
。 Defining q_net the following way solves it:用以下方式定义 q_net 可以解决它:
q_net = Sequential()
q_net.add(Reshape(target_shape=(8,), input_shape=(4,2)))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
q_net.add(Dense(4, activation='linear', kernel_initializer='he_uniform'))
rms = tf.optimizers.RMSprop(lr = 1e-4)
q_net.compile(optimizer=rms, loss='mse')
return q_net
Here, q_net.add(Reshape(target_shape=(8,), input_shape=(4,2)))
reshapes the (None, 4, 2) input to (None, 8) [Here, None represents the batch shape].在这里,
q_net.add(Reshape(target_shape=(8,), input_shape=(4,2)))
将 (None, 4, 2) 输入重塑为 (None, 8) [这里,None 表示批处理形状]。
To verify, print q_net.output_shape
and it should be (None, 4)
[Whereas in the previous case it was (None, 4, 4)
].为了验证,打印
q_net.output_shape
它应该是(None, 4)
[而在前一种情况下它是(None, 4, 4)
]。
You also need to do one more thing.你还需要做一件事。 Recall that
input_shape
does not take batch shape into account.回想一下
input_shape
没有考虑批量形状。 What I mean is, input_shape=(4,2)
expects inputs of shape (batch_shape, 4, 2).我的意思是,
input_shape=(4,2)
期望输入形状为 (batch_shape, 4, 2)。 Verify it by printing q_net.input_shape
and it should output (None, 4, 2)
.通过打印
q_net.input_shape
进行验证,它应该是 output (None, 4, 2)
。 Now, what you have to do is - add a batch dimension to your input.现在,您需要做的是 - 在您的输入中添加一个批次维度。 Simply you can do the following:
只需执行以下操作:
state_with_batch_dim = np.expand_dims(state,0)
And pass state_with_batch_dim
to q_net as input.并将
state_with_batch_dim
作为输入传递给 q_net。 For example, you can call the policy
method you wrote like policy(np.expand_dims(state,0))
and get an output of dimension (batch_shape, 4)
[in this case (1,4)
].例如,您可以调用您编写的
policy
方法,如policy(np.expand_dims(state,0))
并获取维度(batch_shape, 4)
[在本例中为(1,4)
] 的 output。
And here are the answers to your initial questions:以下是您最初问题的答案:
Reshape
layer, the notion of nodes or units does not fit there.Reshape
层,则节点或单元的概念不适合那里。 You can think of the Reshape
layer as a placeholder that takes a tensor of shape (None, 4, 2) and outputs a reshaped tensor of shape (None, 8).Reshape
层视为一个占位符,它采用形状为 (None, 4, 2) 的张量并输出形状为 (None, 8) 的重构张量。argmax
here to find the q-values.argmax
即可找到 q 值。 It could make sense to feed the DQN some information on the direction it's currently facing too.向 DQN 提供一些有关其当前所面临方向的信息也可能是有意义的。 You could set it up as (Current Pos X, Current Pos Y, X From Goal, Y From Goal, Direction).
您可以将其设置为 (Current Pos X, Current Pos Y, X From Goal, Y From Goal, Direction)。
The output layer should just be (Up, Left, Down, Right) in an order you determine. output 层应该按照您确定的顺序(上、左、下、右)。 An Argmax layer is suitable for the problem.
Argmax 层适用于该问题。 Exact code depends on if you using TF / Pytorch.
确切的代码取决于您是否使用 TF / Pytorch。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.