How to extract numbers from a complex captcha

Question

I am trying to resolve captcha for the following image

I have tried using tessaract

data = br.open(captchaurl).read()
b = bytearray(data)
save = open(filename, 'wb')
save.write(data)
save.close()
ctext= pytesseract.image_to_string(Image.open(filename))

Answer 1

Here is a workaround. You need to clear a bit the image but you wont get a perfect result. Try the following:

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract
import cv2

file = 'sample.jpg'

img = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, None, fx=10, fy=10, interpolation=cv2.INTER_LINEAR)
img = cv2.medianBlur(img, 9)
th, img = cv2.threshold(img, 185, 255, cv2.THRESH_BINARY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (4,8))
img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
cv2.imwrite("sample2.jpg", img)


file = 'sample2.jpg'
text = pytesseract.image_to_string(file)
print(''.join(x for x in text if x.isdigit()))

Answer 2

Option 1:

I think using Pytesseract should solve the issue. I tried out your code and it gave me the following result when i gave in the exact cropped captcha image as input into pytesseract:

Input Image:

Output:

print(ctext)
 '436359 oS'

I suggest you don't give the full page url as input into pytesseract. Instead give the exact image url as " https://i.ibb.co/RGn9fF5/Jpeg-Image-CS2.jpg " which will take in only the image.

And regarding the extra 'oS' characters in the output, you can do a string manipulation to chop off the characters other than numbers in the output.

re.sub("[^0-9]", "", ctext)

Option 2:

You can also use google's OCR to accomplish this which gives you the exact result without errors. Though I have shown you the web interface of it, google has nice python libraries through which you can accomplish this using python itself. Looks like this:

How to extract numbers from a complex captcha

Question

2 answers

solution1
1 2019-10-25 06:19:10

solution2
1 2019-10-25 06:21:06

How to extract numbers from a complex captcha

Question

2 answers

solution1 1 2019-10-25 06:19:10

solution2 1 2019-10-25 06:21:06

solution1
1 2019-10-25 06:19:10

solution2
1 2019-10-25 06:21:06