给定一张像所示的图像，您如何建议使用pytesseract改善字符识别

Question

The image i am testing with is that below. 我正在测试的图像如下。

I am very new to OCR and wondered what sort of techniques I could apply to try and improve accuracy of the method in python, probably using PIL but open to suggestions. 我是OCR的新手，我想知道我可以采用哪种技术来尝试和提高python中方法的准确性，可能使用PIL但可以接受建议。 With the raw image used there are no characters recognised at all. 使用原始图像后，根本无法识别任何字符。

Apologies if the question is a little open ended but as I mentioned, very knew to OCR in general. 抱歉，这个问题有点开放，但是正如我所提到的，OCR通常对此很了解。

edit 1: as per suggestion here is the code I have so far: 编辑1：根据建议，这是我到目前为止的代码：

from PIL import Image
import cv2
import pytesseract
image_file=Image.open('rsTest.jpg')
image_file=image_file.convert('1')
image_file.save('PostPro.jpg',dpi=(400,400))
image_file.show

new_image=Image.open('PostPro.jpg')
print pytesseract.image_to_string(new_image)

Answer 1

How constant are your images? 您的图片有多恒定？ In case they all look like the one you posted, what you need to do first is to crop it: 如果它们看上去都像您发布的一样，则首先需要裁剪它：

#Since you are importing cv2
image_file=cv.imread('rsTest.jpg')
crop_image = full_image[start_y:end_y,start_x:end_x]

Then you can just keep the white (which are the letters and turn everything else to black. 然后，您可以只保留白色（即字母，然后将其他所有内容都变成黑色。

crop_image[np.where((crop_image != [255,255,255]).all(axis = 2))] = [0,0,0]

Then apply OCR with tesseract 然后在tesseract上使用OCR

img = Image.fromarray(crop_image)
captchaText = pytesseract.image_to_string(img)

You would need to import cv2, numpy, pytesseract and PIL. 您将需要导入cv2，numpy，pytesseract和PIL。

给定一张像所示的图像，您如何建议使用pytesseract改善字符识别

问题描述

1 个解决方案

解决方案1
0 2017-08-17 23:25:18

给定一张像所示的图像，您如何建议使用pytesseract改善字符识别

问题描述

1 个解决方案

解决方案1 0 2017-08-17 23:25:18

解决方案1
0 2017-08-17 23:25:18