[英]Image to text recognition using Tesseract-OCR is better when Image is preprocessed manually using Gimp than my Python Code
I am trying to write code in Python for the manual Image preprocessing and recognition using Tesseract-OCR.我正在尝试用 Python 编写代码,以便使用 Tesseract-OCR 进行手动图像预处理和识别。
Manual process:手动过程:
For manually recognizing text for a single Image, I preprocess the Image using Gimp and create a TIF image.为了手动识别单个图像的文本,我使用 Gimp 预处理图像并创建一个 TIF 图像。 Then I feed it to Tesseract-OCR which recognizes it correctly.然后我将它提供给 Tesseract-OCR,它可以正确识别它。
To preprocess the image using Gimp I do -要使用 Gimp 预处理图像,我会这样做 -
Then I feed it tesseract -然后我喂它tesseract -
$ tesseract captcha.tif output -psm 6
And I get an accurate result all the time.而且我一直都能得到准确的结果。
Python Code:蟒蛇代码:
I have tried to replicate above procedure using OpenCV and Tesseract -我尝试使用 OpenCV 和 Tesseract 复制上述过程 -
def binarize_image_using_opencv(captcha_path, binary_image_path='input-black-n-white.jpg'):
im_gray = cv2.imread(captcha_path, cv2.CV_LOAD_IMAGE_GRAYSCALE)
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# although thresh is used below, gonna pick something suitable
im_bw = cv2.threshold(im_gray, thresh, 255, cv2.THRESH_BINARY)[1]
cv2.imwrite(binary_image_path, im_bw)
return binary_image_path
def preprocess_image_using_opencv(captcha_path):
bin_image_path = binarize_image_using_opencv(captcha_path)
im_bin = Image.open(bin_image_path)
basewidth = 300 # in pixels
wpercent = (basewidth/float(im_bin.size[0]))
hsize = int((float(im_bin.size[1])*float(wpercent)))
big = im_bin.resize((basewidth, hsize), Image.NEAREST)
# tesseract-ocr only works with TIF so save the bigger image in that format
tif_file = "input-NEAREST.tif"
big.save(tif_file)
return tif_file
def get_captcha_text_from_captcha_image(captcha_path):
# Preprocess the image befor OCR
tif_file = preprocess_image_using_opencv(captcha_path)
# Perform OCR using tesseract-ocr library
# OCR : Optical Character Recognition
image = Image.open(tif_file)
ocr_text = image_to_string(image, config="-psm 6")
alphanumeric_text = ''.join(e for e in ocr_text)
return alphanumeric_text
But I am not getting the same accuracy.但我没有得到同样的准确度。 What did I miss?我错过了什么?
This code is available at https://github.com/hussaintamboli/python-image-to-text此代码可在https://github.com/hussaintamboli/python-image-to-text 获得
如果输出与您的预期输出(即额外的 '," 等,如您的评论中所建议的那样)只是最小偏差,请尝试将字符识别限制为您期望的字符集(例如字母数字)。
You have already applied the simple thresholding.您已经应用了简单阈值。 The missing part is you need to read the images one-by-one缺少的部分是您需要一张一张地阅读图像
For each single-digit对于每个个位数
Upsampling is required for accurate recognition.准确识别需要上采样。 Adding border to the image will center the digit.为图像添加边框将使数字居中。
Code:代码:
import cv2
import pytesseract
img = cv2.imread('Iv5BS.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
(h_thr, w_thr) = thr.shape[:2]
s_idx = 2
e_idx = int(w_thr/6) - 20
result = ""
for _ in range(0, 6):
crp = thr[5:int((6*h_thr)/7), s_idx:e_idx]
(h_crp, w_crp) = crp.shape[:2]
crp = cv2.resize(crp, (w_crp*2, h_crp*2))
crp = cv2.copyMakeBorder(crp, 10, 10, 10, 10, cv2.BORDER_CONSTANT, value=255)
s_idx = e_idx
e_idx = s_idx + int(w_thr/6) - 7
txt = pytesseract.image_to_string(crp, config="--psm 6")
result += txt[0]
cv2.imshow("crp", crp)
cv2.waitKey(0)
print(result)
88BC7F
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.