Tesseract OCR gives really bad output even with typed text

Question

I've been trying to get tesseract OCR to extract some digits from a pre-cropped image and it's not working well at all even though the images are fairly clear. I've tried looking around for solutions but all the other questions I've seen on here involve a problem with cropping or skewed text.

Here's an example of my code which tries to read the image and output to the command line.

    #convert image to greyscale for OCR
    im_g = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

    #create threshold image to simplify things.
    im_t = cv2.threshold(im_g, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)[1]

    #define kernel size
    rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (20,20))

    #Apply dilation to threshold image
    im_d = cv2.dilate(im_t, rect_kernel, iterations = 1)

    #Find countours
    contours = cv2.findContours(im_t, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0]

    for cnt in contours:
        x,y,w,h = cv2.boundingRect(cnt)

        #crop
        im_c = im[y:y+h, x:x+w]

        speed = pytesseract.image_to_string(im_c)
        print(im_path +" : " + speed)

Here's an example of an image

The output for it is:

frame10008.jpg : VAeVAs}

I've gotten a tiny improvement in some images by adding the following config to the tesseract image to string function:

config="--psm 7"

Without the new config, it would detect nothing for this image. Now it outputs

frame100.jpg : | U |

Any ideas as to what I'm doing wrong? Is there a different approach I could be taking to solve this problem? I'm open to not using Tesseract at all.

Answer 1

I've found a decent workaround. First off I've made the image larger. More area for tesseract to work with helped it a lot. Second, to get rid of non-digit outputs, I've used the following config on the image to string function:

config = "--psm 7 outputbase digits"

That line now looks like this:

speed = pytesseract.image_to_string(im_c, config = "--psm 7 outputbase digits")

The data coming back is far from perfect but the success rate is high enough that I should be able to clean up the garbage data and interpolate where tesseract returns no digits.

Answer 2

I tried inverting the foreground and background pixel values and OCRed image using image_to_data function and got the expected result: 7576

gray_image = 255 - gra_image
#convert OpenCV image to PIL image data format
gray_pil = Image.fromarray(gray_image)

# OCR image
config = ('-l eng --oem 1 --psm 7')
text = pytesseract.image_to_data(gray_pil, config=config, output_type='dict')

Tesseract OCR gives really bad output even with typed text

Question

2 answers

solution1
0 2021-12-20 03:04:24

solution2
0 2021-12-20 05:05:49

Tesseract OCR gives really bad output even with typed text

Question

2 answers

solution1 0 2021-12-20 03:04:24

solution2 0 2021-12-20 05:05:49

solution1
0 2021-12-20 03:04:24

solution2
0 2021-12-20 05:05:49