简体   繁体   中英

Tesseract output changing, adding, and removing numbers from very clear image

I am working on a program that uses a webcam to read constantly changing digits off of a screen using pytesseract (long story). It takes an image of the whole screen, then cuts out each number needed to be recorded (there are 23 of them) using predetermined coordinates stored in the list called 'roi'. There are some other steps but this is the most important part. Currently it is adding, deleting, and changing numbers constantly, but not consistently . Here are some examples:

It reads this incorrectly as '32.0'1

It reads this correctly as '52.0'2

It reads this incorrectly as '39.3'3

It reads this incorrectly as '2499.1'4

These images have already been processed using OpenCV, and it's what all the images in the roi set look like. Based on other answers, I have binarized it, tried to clean up the edges, and put a white border around the image (see code).

This program reads the screen every 30 seconds, sometimes getting it right, other times getting it wrong. Many times it likes change 5s into 3s, 3s into 5s, and 5s into 9s. Sometimes it just misses or adds digits altogether. Below is my code for processing the images.

pytesseract.pytesseract.tesseract_cmd = #tesseract file path
scale = 1.4
img = cv2.imread(#image file path#)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.rotate(img, cv2.ROTATE_180)
width = int(img.shape[1] / scale)
height = int(img.shape[0] / scale)
dim = (width, height)
img = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)                                    
cv2.destroyAllWindows()

myData = []
cong = r'--psm 6 -c tessedit_char_whitelist=+0123456789.-'

for x,r in enumerate(roi):                                                                 
    imgCrop = img[r[0][1]:r[1][1], r[0][0]:r[1][0]]        
    scalebig = 0.2
    wid = int(imgCrop.shape[1] / scalebig)
    hei = int(imgCrop.shape[0] / scalebig)
    newdims = (wid, hei)
    imgCrop = cv2.resize(imgCrop, newdims)

    imgCrop = cv2.threshold(imgCrop,155,255,cv2.THRESH_BINARY)[1]

    kernel2 = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))                              
    imgCrop = cv2.morphologyEx(imgCrop, cv2.MORPH_CLOSE, kernel2, iterations=2)

    value = [255,255,255]
    imgCrop = cv2.copyMakeBorder(imgCrop, 10, 10, 10, 10, cv2.BORDER_CONSTANT, None, value = value)

    datapoint = pytesseract.image_to_string(imgCrop, lang='eng', config=cong)
    myData.append(datapoint)

The output is the pictures I linked above.

I have looked into fine tuning it, but I have a Windows machine and I can't seem to find a good tutorial. I am not a programmer by trade, I spent 2 months teaching myself Python to do this, but the machine learning aspect of Tesseract has me spinning, and I don't know how else to fix remarkably inconsistent readings. If you need any further info please ask and I'll be happy to tell you.

Edit: Added some more incorrectly read images for reference

  1. Make sure you use the right image format (jpeg is the wrong format for OCR)
  2. In the case of the tesseract LSTM engine make sure the letter size is not bigger than 35 points.

With tesseract best_tessdata I got these results:

在此处输入图片说明

tesseract 593_small.png -
59.3

在此处输入图片说明

tesseract 520_small.png -
52.0

在此处输入图片说明

tesseract 2491_small.png -
249.1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM