简体   繁体   English

给定一张像所示的图像,您如何建议使用pytesseract改善字符识别

[英]Given an image like the one shown how would you suggest improving character recognition using pytesseract

The image i am testing with is that below. 我正在测试的图像如下。

在此处输入图片说明

I am very new to OCR and wondered what sort of techniques I could apply to try and improve accuracy of the method in python, probably using PIL but open to suggestions. 我是OCR的新手,我想知道我可以采用哪种技术来尝试和提高python中方法的准确性,可能使用PIL但可以接受建议。 With the raw image used there are no characters recognised at all. 使用原始图像后,根本无法识别任何字符。

Apologies if the question is a little open ended but as I mentioned, very knew to OCR in general. 抱歉,这个问题有点开放,但是正如我所提到的,OCR通常对此很了解。

edit 1: as per suggestion here is the code I have so far: 编辑1:根据建议,这是我到目前为止的代码:

from PIL import Image
import cv2
import pytesseract
image_file=Image.open('rsTest.jpg')
image_file=image_file.convert('1')
image_file.save('PostPro.jpg',dpi=(400,400))
image_file.show

new_image=Image.open('PostPro.jpg')
print pytesseract.image_to_string(new_image)

How constant are your images? 您的图片有多恒定? In case they all look like the one you posted, what you need to do first is to crop it: 如果它们看上去都像您发布的一样,则首先需要裁剪它:

#Since you are importing cv2
image_file=cv.imread('rsTest.jpg')
crop_image = full_image[start_y:end_y,start_x:end_x]

Then you can just keep the white (which are the letters and turn everything else to black. 然后,您可以只保留白色(即字母,然后将其他所有内容都变成黑色。

crop_image[np.where((crop_image != [255,255,255]).all(axis = 2))] = [0,0,0]

Then apply OCR with tesseract 然后在tesseract上使用OCR

img = Image.fromarray(crop_image)
captchaText = pytesseract.image_to_string(img)

You would need to import cv2, numpy, pytesseract and PIL. 您将需要导入cv2,numpy,pytesseract和PIL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM