使用python和opencv检测图像中的文本区域

Question

I want to detect the text area of images using python 2.7 and opencv 2.4.9 and draw a rectangle area around it.我想使用 python 2.7 和 opencv 2.4.9 检测图像的文本区域并在它周围绘制一个矩形区域。 Like shown in the example image below.如下面的示例图像所示。

I am new to image processing so any idea how to do this will be appreciated.我是图像处理的新手，所以任何想法如何做到这一点将不胜感激。

Answer 1

There are multiple ways to go about detecting text in an image.有多种方法可以检测图像中的文本。

I recommend looking at this question here , for it may answer your case as well.我建议在此处查看此问题，因为它也可以回答您的情况。 Although it is not in python, the code can be easily translated from c++ to python (Just look at the API and convert the methods from c++ to python, not hard. I did it myself when I tried their code for my own separate problem).虽然它不是在python中，但代码可以很容易地从c++转换为python（只需查看API并将方法从c++转换为python，并不难。当我为自己的单独问题尝试他们的代码时，我自己做了） . The solutions here may not work for your case, but I recommend trying them out.此处的解决方案可能不适用于您的情况，但我建议您尝试一下。

If I were to go about this I would do the following process:如果我要这样做，我将执行以下过程：

Prep your image: If all of your images you want to edit are roughly like the one you provided, where the actual design consists of a range of gray colors, and the text is always black.准备您的图像：如果您要编辑的所有图像都与您提供的图像大致相同，其中实际设计由一系列灰色组成，并且文本始终为黑色。 I would first white out all content that is not black (or already white).我会首先将所有非黑色（或已经是白色）的内容涂白。 Doing so will leave only the black text left.这样做只会留下黑色文本。

# must import if working with opencv in python
import numpy as np
import cv2

# removes pixels in image that are between the range of
# [lower_val,upper_val]
def remove_gray(img,lower_val,upper_val):
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    lower_bound = np.array([0,0,lower_val])
    upper_bound = np.array([255,255,upper_val])
    mask = cv2.inRange(gray, lower_bound, upper_bound)
    return cv2.bitwise_and(gray, gray, mask = mask)

Now that all you have is the black text the goal is to get those boxes.现在您所拥有的只是黑色文本，目标是获取这些框。 As stated before, there are different ways of going about this.如前所述，有不同的方法来解决这个问题。

Stroke Width Transform (SWT)描边宽度变换 (SWT)

The typical way to find text areas: you can find text regions by using stroke width transform as depicted in "Detecting Text in Natural Scenes with Stroke Width Transform " by Boris Epshtein, Eyal Ofek, and Yonatan Wexler.查找文本区域的典型方法：您可以使用笔触宽度变换来查找文本区域，如 Boris Epshtein、Eyal Ofek 和 Yonatan Wexler 所著的“使用笔画宽度变换检测自然场景中的文本”中所述。 To be honest, if this is as fast and reliable as I believe it is, then this method is a more efficient method than my below code.老实说，如果这和我相信的一样快速和可靠，那么这种方法比我下面的代码更有效。 You can still use the code above to remove the blueprint design though, and that may help the overall performance of the swt algorithm.您仍然可以使用上面的代码来删除蓝图设计，这可能有助于 swt 算法的整体性能。

Here is ac library that implements their algorithm, but it is stated to be very raw and the documentation is stated to be incomplete.这是实现他们算法的ac 库，但它被声明为非常原始并且文档被声明为不完整。 Obviously, a wrapper will be needed in order to use this library with python, and at the moment I do not see an official one offered.显然，为了将这个库与 python 一起使用，需要一个包装器，目前我没有看到官方提供的包装器。

The library I linked is CCV .我链接的图书馆是CCV 。 It is a library that is meant to be used in your applications, not recreate algorithms.它是一个旨在用于您的应用程序的库，而不是重新创建算法。 So this is a tool to be used, which goes against OP's want for making it from "First Principles", as stated in comments.因此，这是一个要使用的工具，正如评论中所述，它违背了 OP 从“第一原则”中制作它的愿望。 Still, useful to know it exists if you don't want to code the algorithm yourself.尽管如此，如果您不想自己编码算法，知道它存在很有用。

Home Brewed Non-SWT Method自酿非 SWT 方法

If you have meta data for each image, say in an xml file, that states how many rooms are labeled in each image, then you can access that xml file, get the data about how many labels are in the image, and then store that number in some variable say, num_of_labels .如果您有每个图像的元数据，例如在 xml 文件中，说明每个图像中有多少房间被标记，那么您可以访问该 xml 文件，获取有关图像中有多少标签的数据，然后存储该数据一些变量中的数字说， num_of_labels 。 Now take your image and put it through a while loop that erodes at a set rate that you specify, finding external contours in the image in each loop and stopping the loop once you have the same number of external contours as your num_of_labels .现在获取您的图像并将其放入以您指定的设定速率侵蚀的 while 循环中，在每个循环中查找图像中的外部轮廓，并在外部轮廓数量与num_of_labels同时停止循环。 Then simply find each contours' bounding box and you are done.然后只需找到每个轮廓的边界框即可完成。

# erodes image based on given kernel size (erosion = expands black areas)
def erode( img, kern_size = 3 ):
    retval, img = cv2.threshold(img, 254.0, 255.0, cv2.THRESH_BINARY) # threshold to deal with only black and white.
    kern = np.ones((kern_size,kern_size),np.uint8) # make a kernel for erosion based on given kernel size.
    eroded = cv2.erode(img, kern, 1) # erode your image to blobbify black areas
    y,x = eroded.shape # get shape of image to make a white boarder around image of 1px, to avoid problems with find contours.
    return cv2.rectangle(eroded, (0,0), (x,y), (255,255,255), 1)

# finds contours of eroded image
def prep( img, kern_size = 3 ):    
    img = erode( img, kern_size )
    retval, img = cv2.threshold(img, 200.0, 255.0, cv2.THRESH_BINARY_INV) #   invert colors for findContours
    return cv2.findContours(img,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE) # Find Contours of Image

# given img & number of desired blobs, returns contours of blobs.
def blobbify(img, num_of_labels, kern_size = 3, dilation_rate = 10):
    prep_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count.
    while len(contours) > num_of_labels:
        kern_size += dilation_rate # add dilation_rate to kern_size to increase the blob. Remember kern_size must always be odd.
        previous = (prep_img, contours, hierarchy)
        processed_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count, again.
    if len(contours) < num_of_labels:
        return (processed_img, contours, hierarchy)
    else:
        return previous

# finds bounding boxes of all contours
def bounding_box(contours):
    bBox = []
    for curve in contours:
        box = cv2.boundingRect(curve)
    bBox.append(box)
    return bBox

The resulting boxes from the above method will have space around the labels, and this may include part of the original design, if the boxes are applied to the original image.上述方法生成的框将在标签周围留出空间，如果框应用于原始图像，则这可能包括原始设计的一部分。 To avoid this make regions of interest via your new found boxes and trim the white space.为了避免这种情况，通过新发现的框制作感兴趣的区域并修剪空白区域。 Then save that roi's shape as your new box.然后将该 roi 的形状保存为您的新盒子。

Perhaps you have no way of knowing how many labels will be in the image.也许您无法知道图像中有多少标签。 If this is the case, then I recommend playing around with erosion values until you find the best one to suit your case and get the desired blobs.如果是这种情况，那么我建议您使用侵蚀值，直到找到最适合您的情况并获得所需斑点的值。

Or you could try find contours on the remaining content, after removing the design, and combine bounding boxes into one rectangle based on their distance from each other.或者，您可以尝试在移除设计后在剩余内容上找到轮廓，并根据边界框之间的距离将边界框组合成一个矩形。

After you found your boxes, simply use those boxes with respect to the original image and you will be done.找到您的盒子后，只需根据原始图像使用这些盒子即可完成。

Scene Text Detection Module in OpenCV 3 OpenCV 3 中的场景文本检测模块

As mentioned in the comments to your question, there already exists a means of scene text detection (not document text detection) in opencv 3. I understand you do not have the ability to switch versions, but for those with the same question and not limited to an older opencv version, I decided to include this at the end.正如对您的问题的评论中所提到的，opencv 3 中已经存在一种场景文本检测（不是文档文本检测）的方法。我知道您没有切换版本的能力，但对于那些有相同问题的人，不受限制对于较旧的 opencv 版本，我决定将其包含在最后。 Documentation for the scene text detection can be found with a simple google search.可以通过简单的谷歌搜索找到场景文本检测的文档。

The opencv module for text detection also comes with text recognition that implements tessaract, which is a free open-source text recognition module.用于文本检测的 opencv 模块还带有实现 tessaract 的文本识别，这是一个免费的开源文本识别模块。 The downfall of tessaract, and therefore opencv's scene text recognition module is that it is not as refined as commercial applications and is time consuming to use. tessaract的没落，也因此opencv的场景文本识别模块，就是不如商业应用那么精细，使用起来费时费力。 Thus decreasing its performance, but its free to use, so its the best we got without paying money, if you want text recognition as well.因此降低了它的性能，但它可以免费使用，所以如果你也想要文本识别，它是我们不花钱就能得到的最好的。

Links:链接：

Honestly, I lack the experience and expertise in both opencv and image processing in order to provide a detailed way in implementing their text detection module.老实说，我缺乏 opencv 和图像处理方面的经验和专业知识，无法提供实现其文本检测模块的详细方法。 The same with the SWT algorithm.与 SWT 算法相同。 I just got into this stuff this past few months, but as I learn more I will edit this answer.在过去的几个月里，我刚刚接触了这些东西，但随着我了解更多，我将编辑这个答案。

Answer 2

Here's a simple image processing approach using only thresholding and contour filtering:这是一种仅使用阈值和轮廓过滤的简单图像处理方法：

Obtain binary image.获取二值图像。 Load image, convert to grayscale, Gaussian blur , and adaptive threshold加载图像、转换为灰度、高斯模糊和自适应阈值
Combine adjacent text.合并相邻的文本。 We create a rectangular structuring kernel then dilate to form a single contour我们创建一个矩形结构内核，然后扩张以形成单个轮廓
Filter for text contours.过滤文本轮廓。 We find contours and filter using contour area .我们找到轮廓并使用轮廓区域进行过滤。 From here we can draw the bounding box with cv2.rectangle从这里我们可以用cv2.rectangle绘制边界框

Using this original input image (removed red lines)使用这个原始输入图像（去除红线）

After converting the image to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image将图像转换为灰度和高斯模糊后，我们自适应阈值以获得二值图像

Next we dilate to combine the text into a single contour接下来我们扩张以将文本组合成单个轮廓

From here we find contours and filter using a minimum threshold area (in case there was small noise).从这里我们使用最小阈值区域找到轮廓和过滤器（以防有小噪声）。 Here's the result这是结果

If we wanted to, we could also extract and save each ROI using Numpy slicing如果我们愿意，我们还可以使用 Numpy 切片提取和保存每个 ROI

Code代码

import cv2

# Load image, grayscale, Gaussian blur, adaptive threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (9,9), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,30)

# Dilate to combine adjacent text contours
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
dilate = cv2.dilate(thresh, kernel, iterations=4)

# Find contours, highlight text areas, and extract ROIs
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

ROI_number = 0
for c in cnts:
    area = cv2.contourArea(c)
    if area > 10000:
        x,y,w,h = cv2.boundingRect(c)
        cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
        # ROI = image[y:y+h, x:x+w]
        # cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
        # ROI_number += 1

cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()

Answer 3

Detecting text area in OpenCV is very simple since the EAST came into the picture.在 OpenCV 中检测文本区域非常简单，因为EAST出现在图片中。 The text detector is not only accurate, but it's capable of running in near real-time at approximately 13 FPS on 720p images.文本检测器不仅准确，而且能够在 720p 图像上以大约 13 FPS 的速度近乎实时地运行。 Kick-start tutorial can be found here可以在此处找到启动教程

Answer 4

There's a good tutorial on LearnOpenCV: https://learnopencv.com/deep-learning-based-text-detection-using-opencv-c-python/ LearnOpenCV 上有一个很好的教程： https ://learnopencv.com/deep-learning-based-text-detection-using-opencv-c-python/

The source code can be found here: https://github.com/spmallick/learnopencv/tree/master/TextDetectionEAST源代码可以在这里找到： https : //github.com/spmallick/learnopencv/tree/master/TextDetectionEAST

There is a further OCR tutorial here: https://learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/这里有一个进一步的 OCR 教程： https : //learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/

The OCR source doe is here: https://github.com/spmallick/learnopencv/blob/master/OCR/ocr_simple.py OCR 源代码在这里： https : //github.com/spmallick/learnopencv/blob/master/OCR/ocr_simple.py

使用python和opencv检测图像中的文本区域

问题描述

4 个解决方案

解决方案1
41 2016-07-24 16:38:21

Stroke Width Transform (SWT)描边宽度变换 (SWT)

Home Brewed Non-SWT Method自酿非 SWT 方法

Scene Text Detection Module in OpenCV 3 OpenCV 3 中的场景文本检测模块

解决方案2
14 2019-09-27 03:18:40

解决方案3
0 2020-10-26 18:34:02

解决方案4
0 2021-02-01 12:28:04

使用python和opencv检测图像中的文本区域

问题描述

4 个解决方案

解决方案1 41 2016-07-24 16:38:21

Stroke Width Transform (SWT)描边宽度变换 (SWT)

Home Brewed Non-SWT Method自酿非 SWT 方法

Scene Text Detection Module in OpenCV 3 OpenCV 3 中的场景文本检测模块

解决方案2 14 2019-09-27 03:18:40

解决方案3 0 2020-10-26 18:34:02

解决方案4 0 2021-02-01 12:28:04

解决方案1
41 2016-07-24 16:38:21

解决方案2
14 2019-09-27 03:18:40

解决方案3
0 2020-10-26 18:34:02

解决方案4
0 2021-02-01 12:28:04