简体繁体 English

使用R-CNN进行物体检测？

[英]Object detection with R-CNN?

原文 2017-04-13 22:36:34 5 2 tensorflow/ computer-vision/ deep-learning/ face-detection

What does R-CNN actually do? R-CNN究竟做了什么？ Is it like using features extracted by CNN to detect classes in a specified window area? 是否就像使用CNN提取的功能来检测指定窗口区域中的类一样？ Is there any tensorflow implementation for this? 这是否有任何tensorflow实现？

2 个解决方案

R-CNN is the daddy-algorithm for all the mentioned algos, it really provided the path for researchers to build more complex and better algorithm on top of it. R-CNN是所有提到的算法的爸爸算法，它确实为研究人员提供了在其上构建更复杂和更好的算法的途径。 I am trying to explain R-CNN and the other variants of it. 我试图解释R-CNN及其他变种。

R-CNN, or Region-based Convolutional Neural Network R-CNN，或基于区域的卷积神经网络

R-CNN consist of 3 simple steps: R-CNN包含3个简单步骤：

Scan the input image for possible objects using an algorithm called Selective Search, generating ~2000 region proposals 使用称为选择性搜索的算法扫描输入图像以查找可能的对象，生成~2000个区域提议
Run a convolutional neural net (CNN) on top of each of these region proposals 在每个区域提案的基础上运行卷积神经网络（CNN）
Take the output of each CNN and feed it into a) an SVM to classify the region and b) a linear regressor to tighten the bounding box of the object, if such an object exists. 获取每个CNN的输出并将其输入a）SVM以对区域进行分类，以及b）线性回归器以收紧对象的边界框（如果存在这样的对象）。

Fast R-CNN: 快速R-CNN：

Fast R-CNN was immediately followed R-CNN. 快速R-CNN立即跟随R-CNN。 Fast R-CNN is faster and better by the virtue of following points: 快速R-CNN凭借以下几点更快更好：

Performing feature extraction over the image before proposing regions, thus only running one CNN over the entire image instead of 2000 CNN's over 2000 overlapping regions 在提议区域之前对图像执行特征提取，因此仅在整个图像上运行一个CNN而不是2000个CNN超过2000个重叠区域
Replacing the SVM with a softmax layer, thus extending the neural network for predictions instead of creating a new model. 用softmax层替换SVM，从而扩展神经网络以进行预测，而不是创建新模型。

Intuitively it makes a lot of sense to remove 2000 conv layers and instead take once Convolution and make boxes on top of that. 直观地说，删除2000转换层是很有意义的，而是采取一次卷积并在其上制作框。

Faster R-CNN: 更快的R-CNN：

One of the drawbacks of Fast R-CNN was the slow selective search algorithm and Faster R-CNN introduced something called Region Proposal network(RPN). 快速R-CNN的缺点之一是选择性搜索速度慢，而快速R-CNN引入了称为区域提议网络（RPN）的东西。

Here's is the working of the RPN: 这是RPN的工作原理：

At the last layer of an initial CNN, a 3x3 sliding window moves across the feature map and maps it to a lower dimension (eg 256-d) For each sliding-window location, it generates multiple possible regions based on k fixed-ratio anchor boxes (default bounding boxes) 在初始CNN的最后一层，3x3滑动窗口在特征地图上移动并将其映射到较低维度（例如256-d）。对于每个滑动窗口位置，它基于k个固定比率锚点生成多个可能的区域框（默认边界框）

Each region proposal consists of: 每个地区的提案包括：

An “objectness” score for that region and 该区域的“对象性”得分
4 coordinates representing the bounding box of the region In other words, we look at each location in our last feature map and consider k different boxes centered around it: a tall box, a wide box, a large box, etc. 表示区域边界框的4个坐标换句话说，我们查看最后一个要素图中的每个位置，并考虑以它为中心的k个不同的框：高框，宽框，大框等。

For each of those boxes, we output whether or not we think it contains an object, and what the coordinates for that box are. 对于每个框，我们输出我们是否认为它包含一个对象，以及该框的坐标是什么。 This is what it looks like at one sliding window location: 这是一个滑动窗口位置的样子：

The 2k scores represent the softmax probability of each of the k bounding boxes being on “object.” Notice that although the RPN outputs bounding box coordinates, it does not try to classify any potential objects: its sole job is still proposing object regions. 2k分数表示每个k个边界框在“对象”上的softmax概率。请注意，虽然RPN输出边界框坐标，但它不会尝试对任何潜在对象进行分类：它的唯一工作仍然是提出对象区域。 If an anchor box has an “objectness” score above a certain threshold, that box's coordinates get passed forward as a region proposal. 如果锚箱的“对象性”得分高于某个阈值，则该框的坐标将作为区域提议传递。

Once we have our region proposals, we feed them straight into what is essentially a Fast R-CNN. 一旦我们获得了我们的区域提案，我们就会直接将它们提供给基本上是快速R-CNN的内容。 We add a pooling layer, some fully-connected layers, and finally a softmax classification layer and bounding box regressor. 我们添加了一个池化层，一些完全连接的层，最后是一个softmax分类层和边界框回归器。 In a sense, Faster R-CNN = RPN + Fast R-CNN. 从某种意义上说，更快的R-CNN = RPN +快速R-CNN。

Linking some Tensorflow implementation: 链接一些Tensorflow实现：

https://github.com/smallcorgi/Faster-RCNN_TF https://github.com/smallcorgi/Faster-RCNN_TF

https://github.com/CharlesShang/FastMaskRCNN https://github.com/CharlesShang/FastMaskRCNN

You can find a lot of implementation of Github. 你可以找到很多Github的实现。

PS I borrowed a lot of material from Joyce Xu Medium blog. PS我从Joyce Xu Medium博客那里借了很多资料。

R-CNN is using the following algorithm: R-CNN使用以下算法：

Get region proposals for object detection (using selective search). 获取对象检测的区域提议（使用选择性搜索）。
For each region crop the area from the image and run it thorough a CNN which classify the object. 对于每个区域，从图像中裁剪区域并通过CNN对其进行分类，该CNN对对象进行分类。

There are more advanced algorithms that are built upon this like fast-R-CNN and faster R-CNN. 还有更高级的算法，如快速R-CNN和更快的R-CNN。

fast-R-CNN: 快速-R-CNN：

Run the entire image through the CNN 通过CNN运行整个图像
For each region from the region proposals extract the area using "roi polling" layer and than classify the object. 对于来自区域的每个区域，提议使用“roi轮询”层提取区域，然后对对象进行分类。

faster R-CNN: 更快的R-CNN：

Run the entire image through the CNN 通过CNN运行整个图像
Using the features detected using the CNN find region proposals using a object proposals network. 使用CNN检测到的功能使用对象提议网络查找区域提议。
For each object proposal extract the area using "roi polling" layer and than classify the object. 对于每个对象提议，使用“roi polling”图层提取区域，然后对对象进行分类。

There are a lot of implantation in tensorflow specifically for faster R-CNN which is the most recent variant just google faster R-CNN tensorflow. 张量流中有很多植入专门用于更快的R-CNN，这是最近的变种只是谷歌更快的R-CNN张量流。

Good luck 祝好运