简体   繁体   English

分割验证码图像中的字母

[英]Segmenting letters in a Captcha image

I've written this algorithm in Python for reading CAPTCHAs using scikit-image:我已经用 Python 编写了这个算法,用于使用 scikit-image 读取 CAPTCHA:

from skimage.color import rgb2gray
from skimage import io

def process(self, image):
    """
    Processes a CAPTCHA by removing noise

    Args:
        image (str): The file path of the image to process
    """

    input = io.imread(image)
    histogram = {}

    for x in range(input.shape[0]):
        for y in range(input.shape[1]):
            pixel = input[x, y]
            hex = '%02x%02x%02x' % (pixel[0], pixel[1], pixel[2])

            if hex in histogram:
                histogram[hex] += 1
            else:
                histogram[hex] = 1

    histogram = sorted(histogram, key = histogram.get, reverse=True)
    threshold = len(histogram) * 0.015

    for x in range(input.shape[0]):
        for y in range(input.shape[1]):
            pixel = input[x, y]
            hex = '%02x%02x%02x' % (pixel[0], pixel[1], pixel[2])
            index = histogram.index(hex)

            if index < 3 or index > threshold:
                input[x, y] = [255, 255, 255, 255]

    input = rgb2gray(~input)
    io.imsave(image, input)

Before:前:

前

After:后:

后

It works fairly well and I get decent results after running it through Google's Tesseract OCR, but I want to make it better.它运行得相当好,在通过 Google 的 Tesseract OCR 运行后我得到了不错的结果,但我想让它变得更好。 I think that straightening the letters would yield a much better result.我认为拉直字母会产生更好的结果。 My question is how do I do that?我的问题是我该怎么做?

I understand I need to box the letters somehow, like so:我知道我需要以某种方式将字母装箱,如下所示:

盒装

Then, for each character, rotate it some number of degrees based on a vertical or horizontal line.然后,对于每个字符,根据垂直或水平线将其旋转一定度数。

My initial thought was to identify the center of a character (possibly by finding clusters of most used colors in the histogram) and then expanding a box until it found black, but again, I'm not so sure how to go about doing that.我最初的想法是确定一个字符的中心(可能通过在直方图中找到最常用颜色的簇)然后扩展一个框直到它找到黑色,但同样,我不太确定如何去做。

What are some common practices used in image segmentation to achieve this result?图像分割中使用哪些常见做法来实现此结果?

Edit:编辑:

In the end, further refining the color filters and limiting Tesseract to only characters yielded a nearly 100% accurate result without any deskewing.最后,进一步细化滤色器并将 Tesseract 限制为仅字符,产生了近 100% 准确的结果,没有任何纠偏。

Operation you want to do is technically in computer vision known as deskewing of the objects, for this you have to apply a geometric transformation on the objects, i have a snippet of the code to do apply deskewing on objects (binary).你想要做的操作在技术上是在计算机视觉中被称为对象的纠偏,为此你必须对对象应用几何变换,我有一段代码来对对象(二进制)应用纠偏。 here is the code(uses opencv library):这是代码(使用opencv库):

def deskew(image, width):
    (h, w) = image.shape[:2]
    moments = cv2.moments(image)
    skew = moments["mu11"] / moments["mu02"]
    M = np.float32([[1, skew, -0.5 * w * skew],[0, 1, 0]])
    image = cv2.warpAffine(image, M, (w, h), flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR) 
    return image

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM