简体繁体 English

使用openCV分割，裁剪（边界框）和标记字符

[英]Segment, crop (bounding boxes) and labelling characters with openCV

原文 2017-04-21 09:07:42 1 2 python/ opencv/ image-processing/ python-imaging-library/ conv-neural-network

l have a set of images which represent a sequence of characters. 我有一组代表一系列字符的图像。 l'm wonderning whether OpenCV or other techniques can segment and crop each character from the image. 我想知道OpenCV或其他技术是否可以分割和裁剪图像中的每个字符。 for instance : 例如：

l have as input 我有输入

l want to get : 我想得到：

is 5 是5

is 0 是0

is 4 是4

is 1 是1

is 9 是9

is 2 是2

2 个解决方案

You have two problems here for going from your input to your output : 从输入到输出有两个问题：

The first is seperating your characters. 首先是分隔您的角色。 If your images always look like this, with numbers neatly seperated, then you should have no problem at all seperating them using findContours or connectedComponents , maybe along with a bounding box function like minAreaRect . 如果您的图像始终看起来像这样，并且数字被整齐地分隔开，那么使用findContours或connectedComponents分隔它们，或者与诸如minAreaRect类的边界框函数完全分开，就应该没有问题。

The second problem is once you have seperated your digits, how to tell which digit the image represents. 第二个问题是一旦您分开了数字，如何分辨图像所代表的数字。 This problem has a name : OCR . 这个问题的名字是： OCR 。
If you have a lot of images, it is also possible to train a classification algorithm, as your tagging of this question suggests. 如果您有很多图像，也可以训练一个分类算法，正如您对该问题的标记所暗示的那样。 The "hot topic" right now is deep learning with neural networks, but for simple applications, regular machine learning classification with hand-designed features might do the trick. 当前的“热门话题”是使用神经网络进行深度学习，但是对于简单的应用程序，具有手工设计功能的常规机器学习分类可能会成功。

If you want to segment the numbers, I would first try to play with opening operations (because your letters are black on a white background, it would be closing if it was the opposite) in order to fill the holes that you have in your numbers. 如果您想对数字进行细分，我会先尝试进行开操作（因为您的字母在白色背景上是黑色的，如果相反则会关闭），以便填补数字中的空白。 Then I would project vertically the pixels and analyze the shape that you get. 然后，我将垂直投影像素并分析您得到的形状。 If you find the valley points in this projected shape you will get the vertical limits between characters. 如果在此投影形状中找到了谷点，则将获得字符之间的垂直限制。 You can do the same horizontally to get the upper and bottom limits of your chars. 您可以水平进行相同操作以获得字符的上限和下限。 This approach will only work if the text is horizontal. 仅当文本为水平时，此方法才有效。

Then you could use an standard OCR library or go for deep learning. 然后，您可以使用标准的OCR库或进行深度学习。 Since these number appear to be from MNIST dataset, you will find a lot of examples to do OCR using deep learning or other techniques with this dataset: 由于这些数字似乎来自MNIST数据集，因此您将发现很多使用深度学习或其他技术对此数据集进行OCR的示例：

http://yann.lecun.com/exdb/mnist/ http://yann.lecun.com/exdb/mnist/