简体   繁体   中英

Digit recognition in python (OpenCV and pytesseract)

I'm currently trying to detect numbers from small screenshots. However, I have found the accuracy to be quite poor. I've been using OpenCV, the image is captured in RGB and converted to greyscale, then thresholding has been performed using a global value (I found adaptive didn't work so well).

Here is an example grey-scale of one of the numbers, followed by an example of the image post thresh-holding (the numbers can range from 1-99). Note that the initial screenshot of the image is quite small and is thus enlarged.

在此处输入图像描述

在此处输入图像描述

Any suggestions on how to improve accuracy using OpenCV or a different system altogether are much appreciated. Some code included below, the function is passed a screenshot in RGB of the number.

def getNumber(image):
    image = cv2.resize(image, (0, 0), fx=3, fy=3)
    img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    thresh, image_bin = cv2.threshold(img, 125, 255, cv2.THRESH_BINARY)

    image_final = PIL.Image.fromarray(image_bin)

    txt = pytesseract.image_to_string(
        image_final, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
    return txt

Here's what i could improve, using otsu treshold is more efficent to separate text from background than giving an arbitrary value. Tesseract works better with black text on white background, and i also added padding as tesseract struggle to recognize characters if they are too close to the border.

This is the final image [final_image][1] and pytesseract manage to read "46"

import cv2,numpy,pytesseract
def getNumber(image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # Otsu Tresholding automatically find best threshold value
    _, binary_image = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
    
    # invert the image if the text is white and background is black
    count_white = numpy.sum(binary_image > 0)
    count_black = numpy.sum(binary_image == 0)
    if count_black > count_white:
        binary_image = 255 - binary_image
        
    # padding
    final_image = cv2.copyMakeBorder(image, 10, 10, 10, 10, cv2.BORDER_CONSTANT, value=(255, 255, 255))
    txt = pytesseract.image_to_string(
        final_image, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')

    return txt

Function is executed as:

>> getNumber(cv2.imread(img_path))

EDIT: note that you do not need this line:

image_final = PIL.Image.fromarray(image_bin)

as you can pass to pytesseractr an image in numpy array format (wich cv2 use), and Tesseract accuracy only drops for characters under 35 pixels (and also bigger, 35px height is actually the optimal height) so i did not resize it. [1]: https://i.stack.imgur.com/OaJgQ.png

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM