Given an image like the one shown how would you suggest improving character recognition using pytesseract

Question

The image i am testing with is that below.

I am very new to OCR and wondered what sort of techniques I could apply to try and improve accuracy of the method in python, probably using PIL but open to suggestions. With the raw image used there are no characters recognised at all.

Apologies if the question is a little open ended but as I mentioned, very knew to OCR in general.

edit 1: as per suggestion here is the code I have so far:

from PIL import Image
import cv2
import pytesseract
image_file=Image.open('rsTest.jpg')
image_file=image_file.convert('1')
image_file.save('PostPro.jpg',dpi=(400,400))
image_file.show

new_image=Image.open('PostPro.jpg')
print pytesseract.image_to_string(new_image)

Answer 1

How constant are your images? In case they all look like the one you posted, what you need to do first is to crop it:

#Since you are importing cv2
image_file=cv.imread('rsTest.jpg')
crop_image = full_image[start_y:end_y,start_x:end_x]

Then you can just keep the white (which are the letters and turn everything else to black.

crop_image[np.where((crop_image != [255,255,255]).all(axis = 2))] = [0,0,0]

Then apply OCR with tesseract

img = Image.fromarray(crop_image)
captchaText = pytesseract.image_to_string(img)

You would need to import cv2, numpy, pytesseract and PIL.

Given an image like the one shown how would you suggest improving character recognition using pytesseract

Question

1 answers

solution1
0 2017-08-17 23:25:18

Given an image like the one shown how would you suggest improving character recognition using pytesseract

Question

1 answers

solution1 0 2017-08-17 23:25:18

solution1
0 2017-08-17 23:25:18