如何使用 pytesseract 從圖像中讀取數字

Question

我正在嘗試從這張圖片中讀取數字：

使用具有這些設置的pytesseract ：

custom_config = r'--oem 3 --psm 6'
pytesseract.image_to_string(img, config=custom_config)

這是 output：

((E ST7 [71aT6T2 ] THETOGOG5 15 [8)

Answer 1

僅將整數列入白名單以及更改 psm 可提供更好的結果。 您還需要刪除回車符和空格。 下面是執行此操作的代碼。

import pytesseract
import re
from PIL import Image

#Open image
im = Image.open("numbers.png")

#Define configuration that only whitelists number characters
custom_config = r'--oem 3 --psm 11 -c tessedit_char_whitelist=0123456789'

#Find the numbers in the image
numbers_string = pytesseract.image_to_string(im, config=custom_config)

#Remove all non-number characters
numbers_int = re.sub(r'[a-z\n]', '', numbers_string.lower())

#print the output
print(numbers_int)

圖片上代碼的結果是：'31477423353'

不幸的是，仍然缺少一些數字。 我嘗試了一些實驗，下載了你的圖像並刪除了網格。

刪除網格並再次執行代碼后，pytesseract 產生了完美的結果：'314774628300558'

因此，您可能會嘗試考慮如何以編程方式刪除網格。 有 pytesseract 的替代品，但無論如何你會得到更好的 output 與圖像中隔離的文本。

如何使用 pytesseract 從圖像中讀取數字

問題描述

1 個解決方案

解決方案1
3 2022-02-21 17:20:29

如何使用 pytesseract 從圖像中讀取數字

問題描述

1 個解決方案

解決方案1 3 2022-02-21 17:20:29

解決方案1
3 2022-02-21 17:20:29