简体繁体 English

Deepmind Deep Q网络（DQN）3D卷积

[英]Deepmind Deep Q Network (DQN) 3D Convolution

原文 2016-01-09 10:26:27 2 1 deep-learning/ conv-neural-network/ q-learning

I was reading the deepmind nature paper on DQN network. 我正在阅读DQN网络上的深度自然论文。 I almost got everything about it except one. 除了一个，我几乎得到了一切。 I don't know why no one has asked this question before but it seems a little odd to me anyway. 我不知道为什么之前没有人问这个问题，但对我来说似乎有些奇怪。

My question: Input to DQN is a 84*84*4 image. 我的问题：输入到DQN是一张84 * 84 * 4的图像。 The first convolution layer consists of 32 filters of 8*8 with stide 4. I want to know what is the result of this convolution phase exactly? 第一个卷积层由32个8 * 8的过滤器组成，带有stide 4.我想知道这个卷积阶段的结果究竟是什么？ I mean, the input is 3D, but we have 32 filters which are all 2D. 我的意思是，输入是3D，但我们有32个过滤器都是2D。 How does the third dimension (which corresponds to 4 last frames in the game) take part in the convolution? 第三维（对应于游戏中最后4帧）是如何参与卷积的？

Any ideas? 有任何想法吗？ Thanks Amin 谢谢阿明

1 个解决方案

You can think of the third dimension (representing the last four frames) as channels into the network. 您可以将第三维（表示最后四个帧）视为进入网络的通道。

A similar scenario occurs if you combine three channels of RGB to create a greyscale representation. 如果组合三个RGB通道以创建灰度表示，则会出现类似情况。 In this case you perform each convolution (for each channel) separately and sum the contributions to give the final output feature map. 在这种情况下，您将分别执行每个卷积（针对每个通道）并对贡献求和以给出最终输出要素图。

The DeepMind guys refer to this paper ( What is the Best Multi-Stage Architecture for Object Recognition? ) which may provide a better explanation. DeepMind的人员参考了这篇论文（什么是最佳的物体识别多阶段架构？），它可以提供更好的解释。