简体繁体 English

RetinaNet 特征图维度问题

[英]RetinaNet feature maps dimensional issue

原文 2020-12-17 06:31:02 3 1 computer-vision/ object-detection/ retinanet

I've been reading a lot about object detection and specifically on RetinaNet.我已经阅读了很多关于 object 检测的内容，特别是在 RetinaNet 上。 But the implementation in this part is not that clear to me.但是这部分的实现对我来说不是那么清楚。

it's said, the feature maps from all pyramid levels are passed to the weight shared sub-networks for classification and bounding box regression.据说，所有金字塔级别的特征图都传递给权重共享子网络，用于分类和边界框回归。

But how come this is possible, when the weights of the sub-networks are shared across all pyramid levels?但是，当子网络的权重在所有金字塔级别共享时，这怎么可能呢？ The output would be of a different dimension, because from my understanding, the last layer of each sub-networks is fully connected to the output, if I'm not mistaken. output 将具有不同的维度，因为据我了解，如果我没记错的话，每个子网络的最后一层都与 output 完全连接。 In the original paper it's not clarified.在原始论文中没有澄清。 Is there some zero padding happening here?这里是否发生了一些零填充？

In the Faster-RCNN architectures, ROI pooling layer is applied to address this dimensional issue, but in this case I'm lost..在 Faster-RCNN 架构中，ROI 池化层用于解决这个维度问题，但在这种情况下我迷路了。

1 个解决方案

All the subnetworks are fully-convolutional (with standard zero-padding).所有子网络都是完全卷积的（带有标准的零填充）。 They don't care about the image dimension (height and width).他们不关心图像尺寸（高度和宽度）。

The channel dimension is kept the same through the FPN structure.通道维度通过 FPN 结构保持不变。 That part is not weight-shared.那部分不是重量分担的。