简体   繁体   中英

Using Python and Tesseract OCR to solve Captcha

I am not planning to spam, and besides Google has made captcha obsolete with reCaptcha. I am doing this as a project to learn more about OCR and eventually maybe neural networks.

SO I have an image from a Captcha, I have been able to make modest progress, but the documentation on tesseract isn't exactly well documented. Here is the code I have so far and the results are bellow it.

from selenium import webdriver
from selenium.webdriver.common import keys
import time
import random
import pytesseract
from pytesseract import image_to_string 
from PIL import Image, ImageEnhance, ImageFilter 


def ParsePic():
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
    im = Image.open("path\\screenshot.png") 
    im = im.filter(ImageFilter.CONTOUR)
    im = im.filter(ImageFilter.DETAIL)
    enhancer = ImageEnhance.Contrast(im)
    im = enhancer.enhance(4)
    im = im.convert('L')
    im.save('temp10.png')   
    text = image_to_string(Image.open('temp10.png'))
    print(text)

Original Image

Output

I understand that Captcha was made specifically to defeat OCR, but I read that it is no longer the case, and Im interested in learning how it was done.

My question is, how do I make the background the same color, so the text becomes easily readable?

Late answer but anyway... You are doing edge detection but there are, obviously, to many in this image so this will not work. You will have to do some thing with the colors. I don't know if this is true for every of your captchas but you can just use contrast. You can test this by open up your original with paint (or any other image edit program) and save the image as "monochrom" (black and white only, NOT grayscale)

result: 在此处输入图片说明 without any other editing! (Even the Questionmark is gone)

This would be ready to OCR right away.

Maybe your other images are not this easy, but color/contrast is the way to go for you. If you need ideas on how you can use color, contrast and other things to solve captachs, you can take a look on harder examples and how I solved them here: https://github.com/cracker0dks/CaptchaSolver

cheers

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM