Pytesseract 甚至无法识别非常简单的文本行

Question

Binary image B2 Binary image Y2二进制图像 B2 二进制图像 Y2

I think these images are quite simple and clear.我认为这些图像非常简单明了。 Still pytesseract does not work.仍然 pytesseract 不起作用。 I really wonder why.我真的想知道为什么。

Here is my code这是我的代码

from pytesseract import pytesseract as tesseract
import cv2 as cv

binary = cv.imread(filepath)

lang = 'eng'
config = 'tessedit_char_whitelist=RGB123'
print(tesseract.image_to_string(binary, lang=lang, config=config))

The output is just blank string.输出只是空字符串。

Answer 1

To Dennlinger's point, I would definitely rotate it before sending it through PyTess.对于 Dennlinger 的观点，我肯定会在通过 PyTess 发送之前对其进行旋转。 PyTess should rotate it automatically though.不过 PyTess应该自动旋转它。 Should.应该。

Alternatively, I see in your configuration that you have white listed "RGB123" which, correct me if I'm wrong, may mean that PyTess is mainly looking for those specific numbers and characters.或者，我在您的配置中看到您将“RGB123”列入白名单，如果我错了，请纠正我，这可能意味着 PyTess 主要是在寻找那些特定的数字和字符。

I'd try changing your configuration by omiting that configuration so that it can pick up the "Y" in there.我会尝试通过省略该配置来更改您的配置，以便它可以在那里选择“Y”。

Pytesseract 甚至无法识别非常简单的文本行

问题描述

1 个解决方案

解决方案1
0 2021-11-09 19:16:14

Pytesseract 甚至无法识别非常简单的文本行

问题描述

1 个解决方案

解决方案1 0 2021-11-09 19:16:14

解决方案1
0 2021-11-09 19:16:14