Python tesseract 无法从图像中读取数字

Question

我有一个 python 脚本适用于一些带有数字的图像，它可以正确读取它们。 可用的图像类型在这里：工作图像我正在尝试将脚本与一种仅带有数字的新型图像一起使用，但它不起作用。 新的图像类型在这里：非工作图像

我的脚本如下：

try:
    from PIL import Image
    from PIL import ImageEnhance
except ImportError:
    import Image
import pytesseract

black = (0,0,0)
white = (255,255,255)
threshold = (160,160,160)

# Open input image in grayscale mode and get its pixels.
img = Image.open("./in/web_search.jpg").convert("LA")

# multiply each pixel by 1.2
out = img.point(lambda i: i * 1.3)

enh = ImageEnhance.Contrast(out)
enh.enhance(1.3).show("30% more contrast")

pixels = out.getdata()

newPixels = []
# Compare each pixel 
for pixel in pixels:
    if pixel < threshold:
        newPixels.append(black)
    else:
        newPixels.append(white)

# Create and save new image.
newImg = Image.new("RGB",out.size)
newImg.putdata(newPixels)
newImg.save("./out/web_search.jpg")
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract'
print("-----------------------")
print(pytesseract.image_to_string(Image.open('./out/web_search.jpg'), lang='eng', config='--psm 10 --oem 3 -c tessedit_char_whitelist=1234567890 --tessdata-dir="/usr/share/tesseract-ocr/4.00/tessdata/"'))
print("-----------------------")

我的新图像的结果是：

-----------------------
Riemer gaat bee 6 eee
-----------------------

请问有什么帮助吗？ 谢谢。

Answer 1

你可能需要做一些工作才能让它接受它。 您可以做的一些事情是：

Tesseract 允许您限制可以使用的字符范围。 仅将其设置为数字。
使用某种形式的预处理来消除噪音。 Python 枕头噪声去除 function，或使用形态打开/关闭。
在网络上执行微调训练。

Python tesseract 无法从图像中读取数字

问题描述

1 个解决方案

解决方案1
0 2021-11-18 10:12:33

Python tesseract 无法从图像中读取数字

问题描述

1 个解决方案

解决方案1 0 2021-11-18 10:12:33

解决方案1
0 2021-11-18 10:12:33