The image i am testing with is that below.
I am very new to OCR and wondered what sort of techniques I could apply to try and improve accuracy of the method in python, probably using PIL but open to suggestions. With the raw image used there are no characters recognised at all.
Apologies if the question is a little open ended but as I mentioned, very knew to OCR in general.
edit 1: as per suggestion here is the code I have so far:
from PIL import Image
import cv2
import pytesseract
image_file=Image.open('rsTest.jpg')
image_file=image_file.convert('1')
image_file.save('PostPro.jpg',dpi=(400,400))
image_file.show
new_image=Image.open('PostPro.jpg')
print pytesseract.image_to_string(new_image)
How constant are your images? In case they all look like the one you posted, what you need to do first is to crop it:
#Since you are importing cv2
image_file=cv.imread('rsTest.jpg')
crop_image = full_image[start_y:end_y,start_x:end_x]
Then you can just keep the white (which are the letters and turn everything else to black.
crop_image[np.where((crop_image != [255,255,255]).all(axis = 2))] = [0,0,0]
Then apply OCR with tesseract
img = Image.fromarray(crop_image)
captchaText = pytesseract.image_to_string(img)
You would need to import cv2, numpy, pytesseract and PIL.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.