即使輸入文本，Tesseract OCR 也會給出非常糟糕的 output

Question

我一直在嘗試讓 tesseract OCR 從預先裁剪的圖像中提取一些數字，即使圖像相當清晰，它也無法正常工作。 我試過四處尋找解決方案，但我在這里看到的所有其他問題都涉及裁剪或傾斜文本的問題。

這是我的代碼示例，它嘗試將圖像和 output 讀取到命令行。

    #convert image to greyscale for OCR
    im_g = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

    #create threshold image to simplify things.
    im_t = cv2.threshold(im_g, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)[1]

    #define kernel size
    rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (20,20))

    #Apply dilation to threshold image
    im_d = cv2.dilate(im_t, rect_kernel, iterations = 1)

    #Find countours
    contours = cv2.findContours(im_t, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0]

    for cnt in contours:
        x,y,w,h = cv2.boundingRect(cnt)

        #crop
        im_c = im[y:y+h, x:x+w]

        speed = pytesseract.image_to_string(im_c)
        print(im_path +" : " + speed)

這是圖像的示例

output 是：

frame10008.jpg : VAeVAs}

通過將以下配置添加到 tesseract 圖像到字符串 function，我在一些圖像中得到了微小的改進：

config="--psm 7"

如果沒有新的配置，它不會檢測到這個圖像。 現在它輸出

frame100.jpg : | U |

關於我做錯了什么的任何想法？ 我可以采取不同的方法來解決這個問題嗎？ 我完全不使用 Tesseract。

Answer 1

我找到了一個不錯的解決方法。 首先我把圖片放大了。 tesseract 的更多工作區域對它有很大幫助。 其次，為了擺脫非數字輸出，我在圖像上使用了以下配置來字符串 function：

config = "--psm 7 outputbase digits"

該行現在看起來像這樣：

speed = pytesseract.image_to_string(im_c, config = "--psm 7 outputbase digits")

返回的數據遠非完美，但成功率足夠高，我應該能夠清理垃圾數據並在 tesseract 沒有返回數字的地方進行插值。

Answer 2

我嘗試使用 image_to_data function 反轉前景和背景像素值以及 OCRed 圖像並得到預期結果： 7576

gray_image = 255 - gra_image
#convert OpenCV image to PIL image data format
gray_pil = Image.fromarray(gray_image)

# OCR image
config = ('-l eng --oem 1 --psm 7')
text = pytesseract.image_to_data(gray_pil, config=config, output_type='dict')

即使輸入文本，Tesseract OCR 也會給出非常糟糕的 output

問題描述

2 個解決方案

解決方案1
0 2021-12-20 03:04:24

解決方案2
0 2021-12-20 05:05:49

即使輸入文本，Tesseract OCR 也會給出非常糟糕的 output

問題描述

2 個解決方案

解決方案1 0 2021-12-20 03:04:24

解決方案2 0 2021-12-20 05:05:49

解決方案1
0 2021-12-20 03:04:24

解決方案2
0 2021-12-20 05:05:49