简体   繁体   English

Pytesseract 甚至无法识别非常简单的文本行

[英]Pytesseract can not recognize even very simple textline

Binary image B2 Binary image Y2二进制图像 B2二进制图像 Y2

I think these images are quite simple and clear.我认为这些图像非常简单明了。 Still pytesseract does not work.仍然 pytesseract 不起作用。 I really wonder why.我真的想知道为什么。

Here is my code这是我的代码

from pytesseract import pytesseract as tesseract
import cv2 as cv

binary = cv.imread(filepath)

lang = 'eng'
config = 'tessedit_char_whitelist=RGB123'
print(tesseract.image_to_string(binary, lang=lang, config=config))

The output is just blank string.输出只是空字符串。

To Dennlinger's point, I would definitely rotate it before sending it through PyTess.对于 Dennlinger 的观点,我肯定会在通过 PyTess 发送之前对其进行旋转。 PyTess should rotate it automatically though.不过 PyTess应该自动旋转它。 Should.应该。

Alternatively, I see in your configuration that you have white listed "RGB123" which, correct me if I'm wrong, may mean that PyTess is mainly looking for those specific numbers and characters.或者,我在您的配置中看到您将“RGB123”列入白名单,如果我错了,请纠正我,这可能意味着 PyTess 主要是在寻找那些特定的数字和字符。

I'd try changing your configuration by omiting that configuration so that it can pick up the "Y" in there.我会尝试通过省略该配置来更改您的配置,以便它可以在那里选择“Y”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM