简体   繁体   中英

python captcha decoder library

I need a Captcha decoder for python to read simple image captchas like the following picture:

验证码

简单验证码

验证码

Do you know of a library that can help me read this captcha?

If you don't know of a library for reading captchas, could you help me to read this (and others like this) with PIL?

I hope this captcha is not used anywhere.

Following is a dummy way to decode it. Basically what you need are the patterns from 0 to 9 as present in these captchas. From your examples, I have only the patterns for 0 3 4 5 7 8. Since everything is fixed on them, you know where to split each character. You also know each character is a number of fixed size and fixed font. If it also includes letters or more characters, but of fixed size and font, then the following code can be easily adapted.

What the code does is: a) load the patterns (I considered they are named n0.png, n1.png, ...); b) split the captcha in NUMS pieces; c) do a sum of squared differences between each pattern and each split number; d) decide that the the split number is the one with the smallest sum. It returns a list for each number, in order, present in the captcha. To obtain the initial patterns, you can uncomment the lines that save the split numbers, place a return after that piece, and adjust the file names.

import sys
from PIL import Image, ImageOps

PAT_SIZE = (8, 10)
NUMS = 3
FIRST_NUM_OFFSET = 5
NUM_OFFSET = (1, 3)


NUMBERS = []
for i in xrange(10):
    try:
        NUMBERS.append(Image.open('n%d.png' % i).load())
    except IOError:
        print "I do not know the pattern for the number %d." % i
        NUMBERS.append(None)


def magic(fname):
    captcha = ImageOps.grayscale(Image.open(fname))
    im = captcha.load()

    # Split numbers
    num = []
    for n in xrange(NUMS):
        x1, y1 = (FIRST_NUM_OFFSET + n * (NUM_OFFSET[0] + PAT_SIZE[0]),
                NUM_OFFSET[1])
        num.append(captcha.crop((x1, y1, x1 + PAT_SIZE[0], y1 + PAT_SIZE[1])))

    # If you want to save the split numbers:
    #for i, n in enumerate(num):
    #    n.save('%d.png' % i)

    def sqdiff(a, b):
        if None in (a, b): # XXX This is here just to handle missing pattern.
            return float('inf')

        d = 0
        for x in xrange(PAT_SIZE[0]):
            for y in xrange(PAT_SIZE[1]):
                d += (a[x, y] - b[x, y]) ** 2
        return d

    # Calculate a dummy sum of squared differences between the patterns
    # and each number. We assume the smallest diff is the number in the
    # "captcha".
    result = []
    for n in num:
        n_sqdiff = [(sqdiff(p, n.load()), i) for i, p in enumerate(NUMBERS)]
        result.append(min(n_sqdiff)[1])
    return result

print magic(sys.argv[1])

It is a nice project to do for academic reasons, I was interested in this a while ago. You have a few options:

  1. You write your own with the help from this site: http://www.wausita.com/captcha/

  2. You use OpenCV to do the matching.

If think there was a dedicated libary for neural network image matching but i can't seem to find it.

Basically as the others said, you want to remove the noise, split into single chars and compare it using a chosen technique to the model chars.

I hope you are using it in good faith and you are not going to harm (/spam) anyone.

I won't write you the script nor forward you to an external plugin. But incase you are writing this by your own, this may help:

  • In case you are trying to decode a specific captcha pattern you should collect all chars (I saw from the examples you attached that it's only numbers so it shouldn't be alot of work).
  • Put all of the chars in one file and analyze it with PIL
  • Save in an array each char, its position and its meaning.
  • Get a Captcha image - Clear the background noise if necessary.
  • Split the Captcha image to char-sized and cross it through your self-made dictionary of chars.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM