简体   繁体   English

为什么 pytesseract 不能正确识别?

[英]Why does pytesseract not recognize correctly?

Ok so I've been trying to change my image to whatever works, but I cannot seem to find the right settings..好的,所以我一直在尝试将图像更改为任何有效的图像,但我似乎找不到正确的设置..

This is the image:这是图像: 在此处输入图像描述

As you can see picture is already as simple as anything, but it still cannot recognize '1 BB' from the Image.. Any tips?如您所见,图片已经很简单了,但它仍然无法从图像中识别“1 BB”。有什么提示吗?

img = Image.fromarray(img)
imp_arr = np.asarray(img)
imp_arr = (np.floor(imp_arr / 140.0) * 255.0).astype('uint8')
img = Image.fromarray(imp_arr, mode='L')
width, height = img.size 
img = img.resize((width*3, height*3), Image.BICUBIC)
width, height = img.size 
img = img.resize((width*2, height*2), Image.HAMMING)
width, height = img.size 
img = img.resize((int(width*0.3), int(height*0.3)), Image.BICUBIC)
img = ImageEnhance.Brightness(img).enhance(0.7)
img = ImageEnhance.Sharpness(img).enhance(2)
img = ImageEnhance.Contrast(img).enhance(2)
amount = pytesseract.image_to_string(img, config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

This is just an example, of what I've tried to adjust it correctly to get the correct text to string.这只是一个示例,我尝试正确调整它以将正确的文本转换为字符串。 Some of the times it works other times it prints out gibberish.有时它可以工作,有时它会打印出乱码。 The thing is.. It needs to work every single time, expecially for a picture as clear as this one.问题是……它每次都需要工作,特别是对于像这张这样清晰的图片。 Is there a mastermind who has a simple solution to this problem?有没有一个策划者对这个问题有一个简单的解决方案? Thank you in advance.先感谢您。

After installing Tesseract OCR, Pillow and pytesseract, I saved your image as igor.png and ran the following code, which I found in the docs of pytesseract :安装 Tesseract OCR、Pillow 和 pytesseract 后,我将您的图像保存为igor.png并运行以下代码,我在pytesseract 的文档中找到了这些代码:

#!/usr/bin/env python

from PIL import Image
import pytesseract

print(pytesseract.image_to_string(Image.open("igor.png")))

It prints the expected result:它打印预期的结果:

1BB

If I correct a bit your initial code by adding the letter B to the tessedit_char_whitelist , it works as well.如果我通过将字母B添加到tessedit_char_whitelist来更正您的初始代码,它也可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM