Python tesseract cannot read numbers from image

Question

I have a python script that works for some images with numbers, it reads them correctly. The type of images that work are here: Working image I'm trying to use the script with a new kind of images with numbers only but it is not working. The new images type is here: Non working image

My script is as following:

try:
    from PIL import Image
    from PIL import ImageEnhance
except ImportError:
    import Image
import pytesseract

black = (0,0,0)
white = (255,255,255)
threshold = (160,160,160)

# Open input image in grayscale mode and get its pixels.
img = Image.open("./in/web_search.jpg").convert("LA")

# multiply each pixel by 1.2
out = img.point(lambda i: i * 1.3)

enh = ImageEnhance.Contrast(out)
enh.enhance(1.3).show("30% more contrast")

pixels = out.getdata()

newPixels = []
# Compare each pixel 
for pixel in pixels:
    if pixel < threshold:
        newPixels.append(black)
    else:
        newPixels.append(white)

# Create and save new image.
newImg = Image.new("RGB",out.size)
newImg.putdata(newPixels)
newImg.save("./out/web_search.jpg")
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract'
print("-----------------------")
print(pytesseract.image_to_string(Image.open('./out/web_search.jpg'), lang='eng', config='--psm 10 --oem 3 -c tessedit_char_whitelist=1234567890 --tessdata-dir="/usr/share/tesseract-ocr/4.00/tessdata/"'))
print("-----------------------")

The result with my new image is:

-----------------------
Riemer gaat bee 6 eee
-----------------------

Any help please? Thanks.

Answer 1

You'll probably need to do some work to get it to pick that up. Some things you can do are:

Tesseract allows you to limit the character range which may be used. Set it to numbers only.
Use some form of preprocessing to remove the noise. Either Python Pillow noise removal function, or using morphological opening/closing.
Perform fine tuning training on the network.

Python tesseract cannot read numbers from image

Question

1 answers

solution1
0 2021-11-18 10:12:33

Python tesseract cannot read numbers from image

Question

1 answers

solution1 0 2021-11-18 10:12:33

solution1
0 2021-11-18 10:12:33