简体   繁体   中英

Unable to recognize correct digit on Pytesseract

I am unable to recognize the digit I want using Pytesseract, is there anything I did wrong?

My Code

import cv2
from pytesseract import pytesseract

img = cv2.imread('foo.png')
for i in range(6,14):
    text:str = pytesseract.image_to_string(img, config=f'--oem 3 --psm {i} digits').replace('\n','')

    print(f"psm {i}: {text}")

My Input Image
Image 1

Result

psm 6: 
psm 7: 
psm 8: 
psm 9: 
psm 10: 
psm 11: 
psm 12: 4
psm 13: 

Image 2
Image 2

Result

psm 6: .4
psm 7: 
psm 8: 
psm 9: 
psm 10: 
psm 11: 
psm 12: 
psm 13: 

Image 3

Image 3

Result

psm 6: 4
psm 7: 4
psm 8: 4
psm 9: 4
psm 10: 4
psm 11: 4
psm 12: 
psm 13: 4

How can I have the result that I want? Thanks for helping.

All images have a height of 252 pixels and minimum width of 240 pixels.

Here are some things to try.

You are using isolated digits, so there's no context to help the recognizer, no help from a dictionary. Start with English sentences, then go down to English words, to verify things are working. Then try the harder task of isolated letters / numbers.

Try running Gaussian blur over the image, threshold it to binary, and ask for recognition of that . Or, almost the same thing, reduce "bumpy" artifacts by simply downsizing from 252 px to something smaller. Remember that Tesseract was trained on 300 dpi and 600 dpi images of roughly 8 to 16 pt type. Super large images can paradoxically be bad for recognition.

A few of your images look like they might be skewed by some non-zero theta. Consider deskewing. Or better, consider generating ground truth images at various resolutions, which have zero skew. Ghostscript is one popular way to achieve that.

Please update the question to explain which Ocr Engine Mode you're using. Maybe 3 is OEM_TESSERACT_LSTM_COMBINED ? Are you sure you need to specify the option? That is, do we see worse performance when we let it default?

Wow, there sure are a lot of Page Segmentation Modes, As mentioned above. you're not offering the engine much context, For isolated digits, if you write "1 2 3" in an image, or even "123". the engine has a better chance to verify its estimate of font size than for your example single-digit image, So think about what particular PSMs are good at. and take care to offer an image which plays to such strengths. The estimate for descender and baseline becomes much better once we've seen a few adjacent characters.

Sorry, there are no easy answers. Looks like you have some experimentation ahead of you. Please let us know what you discover!

I run tesseract from command line and I got this output:

>tesseract 7.png - --psm 8
7
>tesseract 3.png - --psm 8
3
>tesseract 9.png - --psm 8
9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM