简体繁体 English

如何在张量流中实现兴趣区域池化层？

[英]How to implement Region of Interest Pooling layer in tensorflow?

原文 2018-08-18 04:15:30 9 1 python/ tensorflow/ neural-network/ artificial-intelligence

I am trying to create Faster RCNN like model. 我正在尝试创建Faster RCNN like model。 I get stuck when it comes to the ROI pooling from the feature map. 从功能图上进行ROI合并时，我陷入了困境。 I know here billinear sampling can be used but, it may not help for end to end training. 我知道这里可以使用双线性采样，但是对于端到端训练可能没有帮助。 How to implement this ROI pooling layer in tensorflow? 如何在张量流中实现此ROI池化层？

1 个解决方案

Bilinear sampling - as the name suggests - can actually be used even with end-to-end training as it's basically a linear operation. 顾名思义，双线性采样实际上可以用于端到端训练，因为它基本上是线性运算。 However, the disadvantage would be that your local maxima (ie strong excitations or certain units) could vanish because your sampling points just happen to be close to the minima. 但是，缺点是您的局部最大值（即强激发或某些单位）可能会消失，因为您的采样点恰好接近最小值。 To remedy this, you can instead apply a max_pool(features, kernel, stride) operation where kernel and stride are adjusted such that the final output of this max pool operation does always have the same dimensions. 为了解决这个问题，您可以改用max_pool(features, kernel, stride)操作，其中调整内核和步幅，以便此max pool操作的最终输出始终具有相同的尺寸。

An example: your features have size 12x12 and you would like to pool to 4x4 , then setting kernel=(3,3) and stride=(3,3) would help you achieve that and for each 3x3 patch, the strongest excitations in the respective feature maps will be contained in the output. 例如：您的12x12尺寸为12x12并且您希望将其12x12为4x4 ，然后设置kernel=(3,3)和stride=(3,3)可以帮助您实现这一目标，对于每个3x3补丁，相应的特征图将包含在输出中。