简体   繁体   English

如何使用 Tesseract 对图像进行 OCR

[英]How to OCR image with Tesseract

I am starting to learn OpenCV and Tesseract, and have trouble with what seems to be a very simple example.我开始学习 OpenCV 和 Tesseract,但似乎是一个非常简单的例子。

Here is an image that I am trying to OCR, that reads "171 m":这是我正在尝试 OCR 的图像,上面写着“171 m”:

原始图像

I do some preprocessing.我做了一些预处理。 Since blue is the dominant color of the text, I extract the blue channel and apply simple thresholding.由于蓝色是文本的主色,我提取了蓝色通道并应用了简单的阈值。

img = cv2.imread('171_m.png')[y, x, 0]
_, thresh = cv2.threshold(img, 150, 255, cv2.THRESH_BINARY_INV)

The resulting image looks like this:生成的图像如下所示:

蓝色通道,简单阈值

Then throw that into Tesseract, with psm 7 for single line:然后将其放入 Tesseract 中,单行使用psm 7

text = pytesseract.image_to_string(thresh, config='--psm 7')
print(text)
>>> lim

I also tried to restrict possible characters, and it gets a bit better, but not quite.我还尝试限制可能的字符,它变得更好了一点,但并不完全。

text = pytesseract.image_to_string(thresh, config='--psm 7 -c tessedit_char_whitelist=1234567890m')
print(text)
>>> 17m
OpenCV v4.1.1.
Tesseract v5.0.0-alpha.20190708

Any help appreciated.任何帮助表示赞赏。

I thought your image was not sharp enough, hence I applied the process described at How do I increase the contrast of an image in Python OpenCV to first sharpen your image and then proceed by extracting the blue layer and running the tesseract.我认为您的图像不够清晰,因此我应用了如何增加 Python OpenCV 中的图像对比度中描述的过程来首先锐化您的图像,然后继续提取蓝色层并运行 tesseract。

I hope this helps.我希望这有帮助。

import cv2
import pytesseract 

img = cv2.imread('test.png') #test.png is your original image
s = 128
img = cv2.resize(img, (s,int(s/2)), 0, 0, cv2.INTER_AREA)

def apply_brightness_contrast(input_img, brightness = 0, contrast = 0):

    if brightness != 0:
        if brightness > 0:
            shadow = brightness
            highlight = 255
        else:
            shadow = 0
            highlight = 255 + brightness
        alpha_b = (highlight - shadow)/255
        gamma_b = shadow

        buf = cv2.addWeighted(input_img, alpha_b, input_img, 0, gamma_b)
    else:
        buf = input_img.copy()

    if contrast != 0:
        f = 131*(contrast + 127)/(127*(131-contrast))
        alpha_c = f
        gamma_c = 127*(1-f)

        buf = cv2.addWeighted(buf, alpha_c, buf, 0, gamma_c)

    return buf

out = apply_brightness_contrast(img,0,64)

b, g, r = cv2.split(out) #spliting and using just the blue

pytesseract.image_to_string(255-b, config='--psm 7 -c tessedit_char_whitelist=1234567890m') # the 255-b here because the image has black backgorund and white numbers, 255-b switches the colors

Disclaimer: This is not a solution, just a trial to partially solve this.免责声明:这不是解决方案,只是部分解决此问题的尝试。

This process works only if you have knowledge of the number of the characters present in the image beforehand.仅当您事先了解图像中存在的字符数时,此过程才有效。 Here is the trial code:这是试用代码:

img0 = cv2.imread('171_m.png', 0)
adap_thresh = cv2.adaptiveThreshold(img0, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
text_adth = pytesseract.image_to_string(adap_thresh, config='--psm 7')

After adaptive thresholding, the produced image is like this:经过自适应阈值处理后,生成的图像是这样的:

自适应阈值图像

Pytesseract gives output as: Pytesseract 给出 output 为:

171 mi.

Now, if you know, in advance, the number of characters present, you can slice the string read by pytesseract and get the desired output as '171m' .现在,如果您事先知道存在的字符数,您可以对 pytesseract 读取的字符串进行切片,并获得所需的 output 为'171m'

Before throwing the image into Pytesseract, preprocessing can help.在将图像放入 Pytesseract 之前,预处理会有所帮助。 The desired text should be in black while the background should be in white.所需文本应为黑色,而背景应为白色。 Here's an approach这是一种方法

  • Convert image to grayscale and enlarge image将图像转换为灰度并放大图像
  • Gaussian blur高斯模糊
  • Otsu's threshold大津的门槛
  • Invert image反转图像

After converting to grayscale, we enlarge the image using imutils.resize() and Gaussian blur.转换为灰度后,我们使用imutils.resize()和高斯模糊放大图像。 From here we Otsu's threshold to get a binary image从这里我们Otsu的阈值得到二值图像

在此处输入图像描述

If you have noisy images, an additional step would be to use morphological operations to smooth or remove noise.如果您有嘈杂的图像,则另一个步骤是使用形态学运算来平滑或消除噪声。 But since your image is clean enough, we can simply invert the image to get our result但是由于您的图像足够干净,我们可以简单地反转图像以获得我们的结果

在此处输入图像描述

Output from Pytesseract using --psm 6来自 Pytesseract 的 Output 使用--psm 6

171m 171m

import cv2
import pytesseract
import imutils

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png',0)
image = imutils.resize(image, width=400)
blur = cv2.GaussianBlur(image, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh 

data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM