简体   繁体   English

使用 cv2 / pytesseract 进行数字识别的局部对比度增强

[英]Local Contrast Enhancement for Digit Recognition with cv2 / pytesseract

I want to use pytesseract to read digits from images.我想使用 pytesseract 从图像中读取数字。 The images look as follows:图像如下所示:

在此处输入图像描述

在此处输入图像描述

The digits are dotted and in order to be able to use pytesseract, I need black connected digits on a white background .数字是点缀的,为了能够使用 pytesseract,我需要白色背景上的黑色连接数字 To do so, I thought about using erode and dilate as preprocessing techniques.为此,我考虑使用erodedilate作为预处理技术。 As you can see, the images are similar, yet quite different in certain aspects.如您所见,图像相似,但在某些方面却大不相同。 For example, the dots in the first image are darker than the background, while the dots in the second are whiter.例如,第一张图像中的点比背景暗,而第二张图像中的点更白。 That means, in the first image I can use erode to get black connected lines and in the second image I can use dilate to get white connected lines and then inverse the colors.这意味着,在第一张图像中,我可以使用 erode 获得黑色连接线,在第二张图像中,我可以使用 dilate 获得白色连接线,然后反转 colors。 This leads to the following results:这导致以下结果:

在此处输入图像描述

在此处输入图像描述

Using an appropriate threshold, the first image can easily be read with pytesseract.使用适当的阈值,可以使用 pytesseract 轻松读取第一张图像。 The second image, whoever, is more tricky.第二个图像,无论是谁,都比较棘手。 The problem is, that for example parts of the "4" are darker than the background around the three.问题是,例如“4”的一部分比三个周围的背景更暗。 So a simple threshold is not going to work.所以一个简单的门槛是行不通的。 I need something like local threshold or local contrast enhancement.我需要局部阈值或局部对比度增强之类的东西。 Does anybody have an idea here?有人在这里有想法吗?

Edit:编辑:

OTSU, mean threshold and gaussian threshold lead to the following results: OTSU、平均阈值和高斯阈值导致以下结果:

在此处输入图像描述

Your images are pretty low res, but you can try a method called gain division .您的图像分辨率很低,但您可以尝试一种称为增益除法的方法。 The idea is that you try to build a model of the background and then weight each input pixel by that model.这个想法是您尝试构建背景的 model,然后通过该 model 对每个输入像素进行加权。 The output gain should be relatively constant during most of the image. output 增益在大部分图像中应该是相对恒定的。

After gain division is performed, you can try to improve the image by applying an area filter and morphology .执行增益划分后,您可以尝试通过应用区域滤波器形态学来改善图像。 I only tried your first image, because it is the "least worst".我只尝试了您的第一张图片,因为它是“最差的”。

These are the steps to get the gain-divided image:这些是获得增益分割图像的步骤:

  1. Apply a soft median blur filter to get rid of high frequency noise.应用软中值模糊过滤器以消除高频噪声。
  2. Get the model of the background via local maximum .通过局部最大值获取后台的model。 Apply a very strong close operation, with a big structuring element (I'm using a rectangular kernel of size 15 ).应用一个非常强大的close操作,具有一个大的structuring element (我使用的是大小为15的矩形 kernel )。
  3. Perform gain adjustment by dividing 255 between each local maximum pixel.通过在每个局部最大像素之间除以255来执行增益调整 Weight this value with each input image pixel.用每个输入图像像素加权这个值。
  4. You should get a nice image where the background illumination is pretty much normalized , threshold this image to get a binary mask of the characters.你应该得到一个很好的图像,其中背景照明非常标准化threshold这个图像以获得字符的二进制掩码。

Now, you can improve the quality of the image with the following, additional steps:现在,您可以通过以下附加步骤来提高图像质量:

  1. Threshold via Otsu , but add a little bit of bias . Threshold通过大津,但添加了一点点偏差 (This, unfortunately, is a manual step depending on the input). (不幸的是,这是一个手动步骤,具体取决于输入)。

  2. Apply an area filter to filter out the smaller blobs of noise.应用区域过滤器以过滤掉较小的噪声斑点。

Let's see the code:让我们看看代码:

import numpy as np
import cv2

# image path
path = "C:/opencvImages/"
fileName = "iA904.png"

# Reading an image in default mode:
inputImage = cv2.imread(path+fileName)

# Remove small noise via median:
filterSize = 5
imageMedian = cv2.medianBlur(inputImage, filterSize)

# Get local maximum:
kernelSize = 15
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(imageMedian, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)

# Perform gain division
gainDivision = np.where(localMax == 0, 0, (inputImage/localMax))

# Clip the values to [0,255]
gainDivision = np.clip((255 * gainDivision), 0, 255)

# Convert the mat type from float to uint8:
gainDivision = gainDivision.astype("uint8") 

# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(gainDivision, cv2.COLOR_BGR2GRAY)

This is what gain division gets you:这就是增益划分为您带来的:

Note that the lighting is more balanced.请注意,照明更加平衡。 Now, let's apply a little bit of contrast enhancement:现在,让我们应用一点对比度增强:

# Contrast Enhancement:
grayscaleImage = np.uint8(cv2.normalize(grayscaleImage, grayscaleImage, 0, 255, cv2.NORM_MINMAX))

You get this, which creates a little bit more contrast between the foreground and the background:你得到了这个,它在前景和背景之间产生了更多的对比:

Now, let's try to threshold this image to get a nice, binary mask.现在,让我们尝试对该图像进行阈值处理以获得一个漂亮的二进制掩码。 As I suggested, try Otsu's thresholding but add (or subtract) a little bit of bias to the result.正如我建议的那样,尝试 Otsu 的阈值处理,但在结果中添加(或减去)一点偏差。 This step, as mentioned, is dependent on the quality of your input:如前所述,此步骤取决于您输入的质量:

# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)

threshValue = 0.9 * threshValue
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)

You end up with this binary mask:你最终得到这个二进制掩码:

Invert this and filter out the small blobs.反转它并过滤掉小斑点。 I set an area threshold value of 10 pixels:我将area阈值设置为10像素:

# Invert image:
binaryImage = 255 - binaryImage

# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(binaryImage, connectivity=4)

# Set the minimum pixels for the area filter:
minArea = 10

# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]

# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype("uint8")

And this is the final binary mask:这是最终的二进制掩码:

If you plan on sending this image to an OCR , you might want to apply some morphology first.如果您打算将此图像发送到OCR ,您可能需要先应用一些形态学 Maybe a closing to try and join the dots that make up the characters.也许是尝试加入构成角色的点的closing Also be sure to train your OCR classifier with a font that is close to what you are actually trying to recognize.还要确保使用与您实际尝试识别的字体接近的字体来训练您的OCR分类器。 This is the (inverted) mask after a size 3 rectangular closing operation with 3 iterations:这是经过3次迭代的大小为3rectangular closing操作后的(反转的)掩码:

Edit:编辑:

To get the last image, process the filtered output as follows:要得到最后一张图像,对过滤后的 output 进行如下处理:

# Set kernel (structuring element) size:
kernelSize = 3

# Set operation iterations:
opIterations = 3

# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))

# Perform closing:
closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)

# Invert image to obtain black numbers on white background:
closingImage = 255 - closingImage

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM