简体   繁体   English

使用 pytesseract 改进图像的 OCR 结果

[英]Improve OCR result from image using pytesseract

I am using pytesseract to read number from the screen in real-time.我正在使用 pytesseract 从屏幕上实时读取数字。 The image mostly number, dot and 2 letters (M and R) as below.图像主要是数字、圆点和 2 个字母(M 和 R),如下所示。 In real-time number will keep changing but the letter M and R will stay the same place.实时数字会不断变化,但字母 M 和 R 将保持不变。 Background will always green with black letters.背景总是带有黑色字母的绿色。

图片

As you can see the number on image is very clear but the pytesseract read the number and the result is not really satisfy.如您所见,图像上的数字非常清晰,但 pytesseract 读取了数字,结果并不真正令人满意。 Sometime its read 7 become 1. I would like to find the algorithms that help improce OCR result.有时它的读数 7 变为 1。我想找到有助于提高 OCR 结果的算法。

Currently I am using Pillow to convert image to gray scale and also try resize image bigger or smaller but still improve result much.目前我正在使用 Pillow 将图像转换为灰度,并尝试将图像大小调整为更大或更小,但仍能大大改善结果。 Also applied filter on the image as below but result still not 100% correct.还对图像应用了过滤器,如下所示,但结果仍然不是 100% 正确。

img = cv2.imread('screenshot.png')
img = cv2.resize(img, None, fx=scale_factor, fy=scale_factor, interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
text = tess.image_to_string(img)

Please help suggest any algorithms that will help improve this OCR result.请帮助建议任何有助于改进此 OCR 结果的算法。

You can easily detect applying simple-thresholding您可以轻松检测应用简单阈值

Threshold临界点 Result结果
在此处输入图像描述 3845.86 M51.31 M 309.12 3845.86 M51.31 M 309.12
3860.43 R191.90 R23.44 3860.43 R191.90 R23.44
  • Thresholding will show the features of the image.阈值化将显示图像的特征。

Code:代码:

import cv2
import pytesseract

img = cv2.imread("UEWHj.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM