[英]Object detection with R-CNN?
What does R-CNN actually do? R-CNN究竟做了什么? Is it like using features extracted by CNN to detect classes in a specified window area? 是否就像使用CNN提取的功能来检测指定窗口区域中的类一样? Is there any tensorflow implementation for this? 这是否有任何tensorflow实现?
R-CNN is the daddy-algorithm for all the mentioned algos, it really provided the path for researchers to build more complex and better algorithm on top of it. R-CNN是所有提到的算法的爸爸算法,它确实为研究人员提供了在其上构建更复杂和更好的算法的途径。 I am trying to explain R-CNN and the other variants of it. 我试图解释R-CNN及其他变种。
R-CNN consist of 3 simple steps: R-CNN包含3个简单步骤:
Fast R-CNN was immediately followed R-CNN. 快速R-CNN立即跟随R-CNN。 Fast R-CNN is faster and better by the virtue of following points: 快速R-CNN凭借以下几点更快更好:
Intuitively it makes a lot of sense to remove 2000 conv layers and instead take once Convolution and make boxes on top of that. 直观地说,删除2000转换层是很有意义的,而是采取一次卷积并在其上制作框。
One of the drawbacks of Fast R-CNN was the slow selective search algorithm and Faster R-CNN introduced something called Region Proposal network(RPN). 快速R-CNN的缺点之一是选择性搜索速度慢,而快速R-CNN引入了称为区域提议网络(RPN)的东西。
Here's is the working of the RPN: 这是RPN的工作原理:
At the last layer of an initial CNN, a 3x3 sliding window moves across the feature map and maps it to a lower dimension (eg 256-d) For each sliding-window location, it generates multiple possible regions based on k fixed-ratio anchor boxes (default bounding boxes) 在初始CNN的最后一层,3x3滑动窗口在特征地图上移动并将其映射到较低维度(例如256-d)。对于每个滑动窗口位置,它基于k个固定比率锚点生成多个可能的区域框(默认边界框)
Each region proposal consists of: 每个地区的提案包括:
For each of those boxes, we output whether or not we think it contains an object, and what the coordinates for that box are. 对于每个框,我们输出我们是否认为它包含一个对象,以及该框的坐标是什么。 This is what it looks like at one sliding window location: 这是一个滑动窗口位置的样子:
The 2k scores represent the softmax probability of each of the k bounding boxes being on “object.” Notice that although the RPN outputs bounding box coordinates, it does not try to classify any potential objects: its sole job is still proposing object regions. 2k分数表示每个k个边界框在“对象”上的softmax概率。请注意,虽然RPN输出边界框坐标,但它不会尝试对任何潜在对象进行分类:它的唯一工作仍然是提出对象区域。 If an anchor box has an “objectness” score above a certain threshold, that box's coordinates get passed forward as a region proposal. 如果锚箱的“对象性”得分高于某个阈值,则该框的坐标将作为区域提议传递。
Once we have our region proposals, we feed them straight into what is essentially a Fast R-CNN. 一旦我们获得了我们的区域提案,我们就会直接将它们提供给基本上是快速R-CNN的内容。 We add a pooling layer, some fully-connected layers, and finally a softmax classification layer and bounding box regressor. 我们添加了一个池化层,一些完全连接的层,最后是一个softmax分类层和边界框回归器。 In a sense, Faster R-CNN = RPN + Fast R-CNN. 从某种意义上说,更快的R-CNN = RPN +快速R-CNN。
Linking some Tensorflow implementation: 链接一些Tensorflow实现:
https://github.com/smallcorgi/Faster-RCNN_TF https://github.com/smallcorgi/Faster-RCNN_TF
https://github.com/CharlesShang/FastMaskRCNN https://github.com/CharlesShang/FastMaskRCNN
You can find a lot of implementation of Github. 你可以找到很多Github的实现。
PS I borrowed a lot of material from Joyce Xu Medium blog. PS我从Joyce Xu Medium博客那里借了很多资料。
R-CNN is using the following algorithm: R-CNN使用以下算法:
There are more advanced algorithms that are built upon this like fast-R-CNN and faster R-CNN. 还有更高级的算法,如快速R-CNN和更快的R-CNN。
fast-R-CNN: 快速-R-CNN:
faster R-CNN: 更快的R-CNN:
There are a lot of implantation in tensorflow specifically for faster R-CNN which is the most recent variant just google faster R-CNN tensorflow. 张量流中有很多植入专门用于更快的R-CNN,这是最近的变种只是谷歌更快的R-CNN张量流。
Good luck 祝好运
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.