[英]How to recognize text with colored background images?
I am new to opencv and python as well as tesseract.我是 opencv 和 python 以及 tesseract 的新手。 Now, I am creating a script that will recognize text from an image.
现在,我正在创建一个可以从图像中识别文本的脚本。 My code works perfectly on black text and white background or white text with black background but not in colored images.
我的代码在黑色文本和白色背景或黑色背景的白色文本上完美运行,但不适用于彩色图像。 Example, white text with blue background such as a button.
例如,带有蓝色背景的白色文本,例如按钮。 Is the font also affecting this?
字体也会影响这个吗? In this case, I am finding the Reboot text (the button)
在这种情况下,我找到了重新启动文本(按钮)
this is the sample image这是示例图像
I tried bunch of codes and methods on image preprocessing via opencv but failed to get the result.我通过 opencv 尝试了一堆关于图像预处理的代码和方法,但未能得到结果。 Image binarizing, noise reduction, grayscale but no good.
图像二值化、降噪、灰度但不行。
This is the sample code:这是示例代码:
from PIL import Image
import pytesseract
import cv2
import numpy as np
# image = Image.open('image.png')
# image = image.convert('-1')
# image.save('new.png')
filename = 'image.png'
outputname = 'converted.png'
# grayscale -----------------------------------------------------
image = cv2.imread(filename)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imwrite(outputname,gray_image)
# binarize -----------------------------------------------------
im_gray = cv2.imread(outputname, cv2.IMREAD_GRAYSCALE)
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
cv2.imwrite(outputname, im_bw)
# remove noise -----------------------------------------------------
im = cv2.imread(outputname)
morph = im.copy()
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
morph = cv2.morphologyEx(morph, cv2.MORPH_CLOSE, kernel)
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
image_channels = np.split(np.asarray(morph), 3, axis=2)
channel_height, channel_width, _ = image_channels[0].shape
# apply Otsu threshold to each channel
for i in range(0, 3):
_, image_channels[i] = cv2.threshold(image_channels[i], 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY)
image_channels[i] = np.reshape(image_channels[i], newshape=(channel_height, channel_width, 1))
# merge the channels
image_channels = np.concatenate((image_channels[0], image_channels[1], image_channels[2]), axis=2)
# save the denoised image
cv2.imwrite(outputname, image_channels)
image = Image.open(outputname)
data_string = pytesseract.image_to_data(image, config='--oem 1')
data_string = data_string.encode('utf-8')
open('image.tsv', 'wb').write(data_string)
By running the code, I get this image: [![enter image description here][1]][1]通过运行代码,我得到了这个图像:[![在此处输入图像描述][1]][1]
And the result of tesseract with TSV parameter:以及带有 TSV 参数的 tesseract 的结果:
level page_num block_num par_num line_num word_num left top width height conf text
1 1 0 0 0 0 0 0 1024 768 -1
2 1 1 0 0 0 2 13 1002 624 -1
3 1 1 1 0 0 2 13 1002 624 -1
4 1 1 1 1 0 172 13 832 22 -1
5 1 1 1 1 1 172 13 127 22 84 CONFIGURATION
5 1 1 1 1 2 822 17 59 11 92 CENTOS
5 1 1 1 1 3 887 17 7 11 95 7
5 1 1 1 1 4 900 17 104 11 95 INSTALLATION
4 1 1 1 2 0 86 29 900 51 -1
5 1 1 1 2 1 86 35 15 45 12 4
5 1 1 1 2 2 825 30 27 40 50 Bes
5 1 1 1 2 3 952 29 34 40 51 Hel
4 1 1 1 3 0 34 91 87 17 -1
5 1 1 1 3 1 34 91 87 17 90 CentOS
4 1 1 1 4 0 2 116 9 8 -1
5 1 1 1 4 1 2 116 9 8 0 ‘
4 1 1 1 5 0 184 573 57 14 -1
5 1 1 1 5 1 184 573 57 14 90 Complete!
4 1 1 1 6 0 634 606 358 14 -1
5 1 1 1 6 1 634 606 43 10 89 CentOS
5 1 1 1 6 2 683 609 7 7 96 is
5 1 1 1 6 3 696 609 24 7 96 now
5 1 1 1 6 4 725 606 67 14 96 successfully
5 1 1 1 6 5 797 606 45 10 96 installed
5 1 1 1 6 6 848 606 18 10 96 and
5 1 1 1 6 7 872 599 29 25 96 ready
5 1 1 1 6 8 906 599 15 25 95 for
5 1 1 1 6 9 928 609 20 11 96 you
5 1 1 1 6 10 953 608 12 8 96 to
5 1 1 1 6 11 971 606 21 10 95 use!
4 1 1 1 7 0 775 623 217 14 -1
5 1 1 1 7 1 775 623 15 10 95 Go
5 1 1 1 7 2 796 623 31 10 96 ahead
5 1 1 1 7 3 833 623 18 10 96 and
5 1 1 1 7 4 857 623 38 10 96 reboot
5 1 1 1 7 5 900 625 12 8 96 to
5 1 1 1 7 6 918 625 25 8 95 start
5 1 1 1 7 7 949 626 28 11 96 using
5 1 1 1 7 8 983 623 9 10 93 it!
As you can see, the "Reboot" text is not showing.如您所见,“重新启动”文本未显示。 Maybe it is because of the font?
也许是因为字体? Or the color?
还是颜色?
Here are two different approaches:这里有两种不同的方法:
1. Traditional image processing and contour filtering 1.传统的图像处理和轮廓滤波
The main idea is to extract the ROI then apply Tesseract OCR.主要思想是提取 ROI,然后应用 Tesseract OCR。
Once we obtain a binary image from adaptive thresholding, we find contours and filter using contour approximation with cv2.arcLength()
and cv2.approxPolyDP()
.一旦我们通过自适应阈值处理获得二值图像,我们就会使用
cv2.arcLength()
和cv2.approxPolyDP()
的轮廓逼近找到轮廓并进行过滤。 If the contour has four points, we assume it is either a rectangle or square.如果轮廓有四个点,我们假设它是矩形或正方形。 In addition, we apply a second filter using contour area to ensure that we isolate the correct ROI.
此外,我们使用轮廓区域应用第二个过滤器,以确保我们隔离正确的 ROI。 Here's the extracted ROI
这是提取的投资回报率
import cv2
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,9,3)
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
ROI_number = 0
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.05 * peri, True)
if len(approx) == 4 and area > 2200:
x,y,w,h = cv2.boundingRect(approx)
ROI = image[y:y+h, x:x+w]
cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
ROI_number += 1
Now we can throw this into Pytesseract.现在我们可以把它扔到 Pytesseract 中。 Note Pytesseract requires that the image text be in black while the background in white so we do a bit of preprocessing first.
注意 Pytesseract 要求图像文本为黑色而背景为白色,因此我们先进行一些预处理。 Here's the preprocessed image and result from Pytesseract
这是 Pytesseract 的预处理图像和结果
Reboot
重启
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('ROI.png',0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 10 ')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
Normally, you would also need to use morphological transformations to smooth the image but for this case, the text is good enough通常,您还需要使用形态变换来平滑图像,但对于这种情况,文本就足够了
2. Color Thresholding 2. 颜色阈值
The second approach is to use color thresholding with lower and upper HSV thresholds to create a mask where we can extract the ROI.第二种方法是使用具有较低和较高 HSV 阈值的颜色阈值来创建我们可以提取 ROI 的蒙版。 Look here for a complete example.
在这里查看一个完整的例子。 Once the ROI is extracted, we follow the same steps to preprocess the image before throwing it into Pytesseract
提取 ROI 后,我们按照相同的步骤对图像进行预处理,然后再将其放入 Pytesseract
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.