简体   繁体   English

Python tesseract 无法从图像中读取数字

[英]Python tesseract cannot read numbers from image

I have a python script that works for some images with numbers, it reads them correctly.我有一个 python 脚本适用于一些带有数字的图像,它可以正确读取它们。 The type of images that work are here: Working image I'm trying to use the script with a new kind of images with numbers only but it is not working.可用的图像类型在这里: 工作图像我正在尝试将脚本与一种仅带有数字的新型图像一起使用,但它不起作用。 The new images type is here: Non working image新的图像类型在这里:非工作图像

My script is as following:我的脚本如下:

try:
    from PIL import Image
    from PIL import ImageEnhance
except ImportError:
    import Image
import pytesseract

black = (0,0,0)
white = (255,255,255)
threshold = (160,160,160)

# Open input image in grayscale mode and get its pixels.
img = Image.open("./in/web_search.jpg").convert("LA")

# multiply each pixel by 1.2
out = img.point(lambda i: i * 1.3)

enh = ImageEnhance.Contrast(out)
enh.enhance(1.3).show("30% more contrast")

pixels = out.getdata()

newPixels = []
# Compare each pixel 
for pixel in pixels:
    if pixel < threshold:
        newPixels.append(black)
    else:
        newPixels.append(white)

# Create and save new image.
newImg = Image.new("RGB",out.size)
newImg.putdata(newPixels)
newImg.save("./out/web_search.jpg")
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract'
print("-----------------------")
print(pytesseract.image_to_string(Image.open('./out/web_search.jpg'), lang='eng', config='--psm 10 --oem 3 -c tessedit_char_whitelist=1234567890 --tessdata-dir="/usr/share/tesseract-ocr/4.00/tessdata/"'))
print("-----------------------")

The result with my new image is:我的新图像的结果是:

-----------------------
Riemer gaat bee 6 eee
-----------------------

Any help please?请问有什么帮助吗? Thanks.谢谢。

You'll probably need to do some work to get it to pick that up.你可能需要做一些工作才能让它接受它。 Some things you can do are:您可以做的一些事情是:

  1. Tesseract allows you to limit the character range which may be used. Tesseract 允许您限制可以使用的字符范围。 Set it to numbers only.仅将其设置为数字。
  2. Use some form of preprocessing to remove the noise.使用某种形式的预处理来消除噪音。 Either Python Pillow noise removal function, or using morphological opening/closing. Python 枕头噪声去除 function,或使用形态打开/关闭。
  3. Perform fine tuning training on the network.在网络上执行微调训练。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM