简体繁体 English

深度学习以进行形状定位和识别

[英]deep learning for shape localization and recognition

原文 2017-05-11 15:26:38 4 2 tensorflow/ computer-vision/ deep-learning/ theano/ caffe

There is a set of images, each of which contains different shape entities, such as shown in the following figure. 有一组图像，每个图像包含不同的形状实体，如下图所示。 I am trying to localize and recognize these different shapes. 我正在尝试定位并认识这些不同的形状。 For instance, adding a bounding box for each different shape and maybe even label it. 例如，为每个不同的形状添加一个边界框，甚至标记它。 What are the major research papers/deep learning models that have been able to solve this kind of problem? 有哪些主要的研究论文/深度学习模型能够解决此类问题？

2 个解决方案

Object detection papers such as rcnn, faster rcnn, yolo and ssd would help you solve this if you were bent on using a deep learning approach. 如果您热衷于使用深度学习方法，则诸如rcnn，更快的rcnn，yolo和ssd之类的对象检测论文将帮助您解决此问题。

It's easy to say this is a trivial problem that can be solved with tools in OpenCV and deep learning is overkill, but I can see many reasons to use deep learning tools and that does not answer your question. 可以很容易地说这是一个微不足道的问题，可以使用OpenCV中的工具解决，而深度学习则显得过头了，但是我可以看到许多使用深度学习工具的原因，但并不能回答您的问题。

We assume that your shapes has different scales and rotations. 我们假设您的形状具有不同的比例和旋转度。 Actually your main image shown above is very large for training process and it needs a lot of training samples to generate a good accuracy at the end on test samples. 实际上，上面显示的主图像对于训练过程而言非常大，并且需要大量训练样本才能在测试样本的末尾产生良好的准确性。 In this case it is better to train a Convolutional Neural Network on a short images (like 128x128) with only one shape per each image and then use slide trick! 在这种情况下，最好在短图像（如128x128）上训练卷积神经网络，每个图像仅具有一个形状，然后使用滑动技巧！ This project will have three main steps: 该项目将包含三个主要步骤：

Generate test and train samples, each image should have only one shape 生成测试和训练样本，每个图像应该只有一个形状
Train a classifier to recognize a single shape within each input image 训练分类器以识别每个输入图像中的单个形状
Use slide trick! 使用幻灯片技巧！ Break your original image containing many shapes to overlapping blocks of size 128x128. 将包含许多形状的原始图像分解为大小为128x128的重叠块。 Pass each block to your model trained in the second step. 将每个块传递给第二步训练的模型。

In this way at the end you will have label for each shape from your trained model, and also you will have location of each shape using slide trick. 这样一来，您最终将在训练有素的模型中为每个形状添加标签，并且还将使用滑动技巧获得每个形状的位置。 For the classifier you can use exactly CNN structure of Tensorflow's MNIST tutorial. 对于分类器，您可以使用Tensorflow的MNIST教程的CNN结构。 Here is a paper with exactly same method applied to finger print images to extract local features. 这是一种使用完全相同的方法应用于指纹图像以提取局部特征的纸张。 A direct fingerprint minutiae extraction approach based on convolutional neural networks 基于卷积神经网络的直接指纹细节提取方法