简体   繁体   English

如何识别带有彩色背景图像的文本?

[英]How to recognize text with colored background images?

I am new to opencv and python as well as tesseract.我是 opencv 和 python 以及 tesseract 的新手。 Now, I am creating a script that will recognize text from an image.现在,我正在创建一个可以从图像中识别文本的脚本。 My code works perfectly on black text and white background or white text with black background but not in colored images.我的代码在黑色文本和白色背景或黑色背景的白色文本上完美运行,但不适用于彩色图像。 Example, white text with blue background such as a button.例如,带有蓝色背景的白色文本,例如按钮。 Is the font also affecting this?字体也会影响这个吗? In this case, I am finding the Reboot text (the button)在这种情况下,我找到了重新启动文本(按钮)

this is the sample image这是示例图像

I tried bunch of codes and methods on image preprocessing via opencv but failed to get the result.我通过 opencv 尝试了一堆关于图像预处理的代码和方法,但未能得到结果。 Image binarizing, noise reduction, grayscale but no good.图像二值化、降噪、灰度但不行。

This is the sample code:这是示例代码:

from PIL import Image
import pytesseract
import cv2
import numpy as np

# image = Image.open('image.png')
# image = image.convert('-1')
# image.save('new.png')

filename = 'image.png'
outputname = 'converted.png'

# grayscale -----------------------------------------------------
image = cv2.imread(filename)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imwrite(outputname,gray_image)

# binarize -----------------------------------------------------
im_gray = cv2.imread(outputname, cv2.IMREAD_GRAYSCALE)
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
cv2.imwrite(outputname, im_bw)

# remove noise -----------------------------------------------------
im = cv2.imread(outputname)
morph = im.copy()

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
morph = cv2.morphologyEx(morph, cv2.MORPH_CLOSE, kernel)
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
image_channels = np.split(np.asarray(morph), 3, axis=2)

channel_height, channel_width, _ = image_channels[0].shape

# apply Otsu threshold to each channel
for i in range(0, 3):
    _, image_channels[i] = cv2.threshold(image_channels[i], 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY)
    image_channels[i] = np.reshape(image_channels[i], newshape=(channel_height, channel_width, 1))

# merge the channels
image_channels = np.concatenate((image_channels[0], image_channels[1], image_channels[2]), axis=2)

# save the denoised image
cv2.imwrite(outputname, image_channels)

image = Image.open(outputname)
data_string = pytesseract.image_to_data(image, config='--oem 1')
data_string = data_string.encode('utf-8')
open('image.tsv', 'wb').write(data_string)

By running the code, I get this image: [![enter image description here][1]][1]通过运行代码,我得到了这个图像:[![在此处输入图像描述][1]][1]

And the result of tesseract with TSV parameter:以及带有 TSV 参数的 tesseract 的结果:

level   page_num    block_num   par_num line_num    word_num    left    top width   height  conf    text
1   1   0   0   0   0   0   0   1024    768 -1  
2   1   1   0   0   0   2   13  1002    624 -1  
3   1   1   1   0   0   2   13  1002    624 -1  
4   1   1   1   1   0   172 13  832 22  -1  
5   1   1   1   1   1   172 13  127 22  84  CONFIGURATION
5   1   1   1   1   2   822 17  59  11  92  CENTOS
5   1   1   1   1   3   887 17  7   11  95  7
5   1   1   1   1   4   900 17  104 11  95  INSTALLATION
4   1   1   1   2   0   86  29  900 51  -1  
5   1   1   1   2   1   86  35  15  45  12  4
5   1   1   1   2   2   825 30  27  40  50  Bes
5   1   1   1   2   3   952 29  34  40  51  Hel
4   1   1   1   3   0   34  91  87  17  -1  
5   1   1   1   3   1   34  91  87  17  90  CentOS
4   1   1   1   4   0   2   116 9   8   -1  
5   1   1   1   4   1   2   116 9   8   0   ‘
4   1   1   1   5   0   184 573 57  14  -1  
5   1   1   1   5   1   184 573 57  14  90  Complete!
4   1   1   1   6   0   634 606 358 14  -1  
5   1   1   1   6   1   634 606 43  10  89  CentOS
5   1   1   1   6   2   683 609 7   7   96  is
5   1   1   1   6   3   696 609 24  7   96  now
5   1   1   1   6   4   725 606 67  14  96  successfully
5   1   1   1   6   5   797 606 45  10  96  installed
5   1   1   1   6   6   848 606 18  10  96  and
5   1   1   1   6   7   872 599 29  25  96  ready
5   1   1   1   6   8   906 599 15  25  95  for
5   1   1   1   6   9   928 609 20  11  96  you
5   1   1   1   6   10  953 608 12  8   96  to
5   1   1   1   6   11  971 606 21  10  95  use!
4   1   1   1   7   0   775 623 217 14  -1  
5   1   1   1   7   1   775 623 15  10  95  Go
5   1   1   1   7   2   796 623 31  10  96  ahead
5   1   1   1   7   3   833 623 18  10  96  and
5   1   1   1   7   4   857 623 38  10  96  reboot
5   1   1   1   7   5   900 625 12  8   96  to
5   1   1   1   7   6   918 625 25  8   95  start
5   1   1   1   7   7   949 626 28  11  96  using
5   1   1   1   7   8   983 623 9   10  93  it!

As you can see, the "Reboot" text is not showing.如您所见,“重新启动”文本未显示。 Maybe it is because of the font?也许是因为字体? Or the color?还是颜色?

Here are two different approaches:这里有两种不同的方法:

1. Traditional image processing and contour filtering 1.传统的图像处理和轮廓滤波

The main idea is to extract the ROI then apply Tesseract OCR.主要思想是提取 ROI,然后应用 Tesseract OCR。

  • Convert image to grayscale and Gaussian blur将图像转换为灰度和高斯模糊
  • Adaptive threshold自适应阈值
  • Find contours寻找轮廓
  • Iterate through contours and filter using contour approximation and area使用轮廓近似和面积迭代轮廓和过滤
  • Extract ROI提取投资回报率

Once we obtain a binary image from adaptive thresholding, we find contours and filter using contour approximation with cv2.arcLength() and cv2.approxPolyDP() .一旦我们通过自适应阈值处理获得二值图像,我们就会使用cv2.arcLength()cv2.approxPolyDP()的轮廓逼近找到轮廓并进行过滤。 If the contour has four points, we assume it is either a rectangle or square.如果轮廓有四个点,我们假设它是矩形或正方形。 In addition, we apply a second filter using contour area to ensure that we isolate the correct ROI.此外,我们使用轮廓区域应用第二个过滤器,以确保我们隔离正确的 ROI。 Here's the extracted ROI这是提取的投资回报率

在此处输入图像描述

import cv2

image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,9,3)

cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

ROI_number = 0
for c in cnts:
    area = cv2.contourArea(c)
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.05 * peri, True)
    if len(approx) == 4 and area > 2200:
        x,y,w,h = cv2.boundingRect(approx)
        ROI = image[y:y+h, x:x+w]
        cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
        ROI_number += 1

Now we can throw this into Pytesseract.现在我们可以把它扔到 Pytesseract 中。 Note Pytesseract requires that the image text be in black while the background in white so we do a bit of preprocessing first.注意 Pytesseract 要求图像文本为黑色而背景为白色,因此我们先进行一些预处理。 Here's the preprocessed image and result from Pytesseract这是 Pytesseract 的预处理图像和结果

在此处输入图像描述

Reboot重启

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('ROI.png',0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

result = 255 - thresh 

data = pytesseract.image_to_string(result, lang='eng',config='--psm 10 ')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()

Normally, you would also need to use morphological transformations to smooth the image but for this case, the text is good enough通常,您还需要使用形态变换来平滑图像,但对于这种情况,文本就足够了

2. Color Thresholding 2. 颜色阈值

The second approach is to use color thresholding with lower and upper HSV thresholds to create a mask where we can extract the ROI.第二种方法是使用具有较低和较高 HSV 阈值的颜色阈值来创建我们可以提取 ROI 的蒙版。 Look here for a complete example. 在这里查看一个完整的例子。 Once the ROI is extracted, we follow the same steps to preprocess the image before throwing it into Pytesseract提取 ROI 后,我们按照相同的步骤对图像进行预处理,然后再将其放入 Pytesseract

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM