使用 cv2 / pytesseract 进行数字识别的局部对比度增强

Question

我想使用 pytesseract 从图像中读取数字。 图像如下所示：

数字是点缀的，为了能够使用 pytesseract，我需要白色背景上的黑色连接数字。 为此，我考虑使用erode和dilate作为预处理技术。 如您所见，图像相似，但在某些方面却大不相同。 例如，第一张图像中的点比背景暗，而第二张图像中的点更白。 这意味着，在第一张图像中，我可以使用 erode 获得黑色连接线，在第二张图像中，我可以使用 dilate 获得白色连接线，然后反转 colors。 这导致以下结果：

使用适当的阈值，可以使用 pytesseract 轻松读取第一张图像。 第二个图像，无论是谁，都比较棘手。 问题是，例如“4”的一部分比三个周围的背景更暗。 所以一个简单的门槛是行不通的。 我需要局部阈值或局部对比度增强之类的东西。 有人在这里有想法吗？

编辑：

OTSU、平均阈值和高斯阈值导致以下结果：

Answer 1

您的图像分辨率很低，但您可以尝试一种称为增益除法的方法。 这个想法是您尝试构建背景的 model，然后通过该 model 对每个输入像素进行加权。 output 增益在大部分图像中应该是相对恒定的。

执行增益划分后，您可以尝试通过应用区域滤波器和形态学来改善图像。 我只尝试了您的第一张图片，因为它是“最差的”。

这些是获得增益分割图像的步骤：

应用软中值模糊过滤器以消除高频噪声。
通过局部最大值获取后台的model。 应用一个非常强大的close操作，具有一个大的structuring element （我使用的是大小为15的矩形 kernel ）。
通过在每个局部最大像素之间除以255来执行增益调整。 用每个输入图像像素加权这个值。
你应该得到一个很好的图像，其中背景照明非常标准化， threshold这个图像以获得字符的二进制掩码。

现在，您可以通过以下附加步骤来提高图像质量：

Threshold通过大津，但添加了一点点偏差。 （不幸的是，这是一个手动步骤，具体取决于输入）。
应用区域过滤器以过滤掉较小的噪声斑点。

让我们看看代码：

import numpy as np
import cv2

# image path
path = "C:/opencvImages/"
fileName = "iA904.png"

# Reading an image in default mode:
inputImage = cv2.imread(path+fileName)

# Remove small noise via median:
filterSize = 5
imageMedian = cv2.medianBlur(inputImage, filterSize)

# Get local maximum:
kernelSize = 15
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(imageMedian, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)

# Perform gain division
gainDivision = np.where(localMax == 0, 0, (inputImage/localMax))

# Clip the values to [0,255]
gainDivision = np.clip((255 * gainDivision), 0, 255)

# Convert the mat type from float to uint8:
gainDivision = gainDivision.astype("uint8") 

# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(gainDivision, cv2.COLOR_BGR2GRAY)

这就是增益划分为您带来的：

请注意，照明更加平衡。 现在，让我们应用一点对比度增强：

# Contrast Enhancement:
grayscaleImage = np.uint8(cv2.normalize(grayscaleImage, grayscaleImage, 0, 255, cv2.NORM_MINMAX))

你得到了这个，它在前景和背景之间产生了更多的对比：

现在，让我们尝试对该图像进行阈值处理以获得一个漂亮的二进制掩码。 正如我建议的那样，尝试 Otsu 的阈值处理，但在结果中添加（或减去）一点偏差。 如前所述，此步骤取决于您输入的质量：

# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)

threshValue = 0.9 * threshValue
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)

你最终得到这个二进制掩码：

反转它并过滤掉小斑点。 我将area阈值设置为10像素：

# Invert image:
binaryImage = 255 - binaryImage

# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(binaryImage, connectivity=4)

# Set the minimum pixels for the area filter:
minArea = 10

# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]

# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype("uint8")

这是最终的二进制掩码：

如果您打算将此图像发送到OCR ，您可能需要先应用一些形态学。 也许是尝试加入构成角色的点的closing 。 还要确保使用与您实际尝试识别的字体接近的字体来训练您的OCR分类器。 这是经过3次迭代的大小为3的rectangular closing操作后的（反转的）掩码：

编辑：

要得到最后一张图像，对过滤后的 output 进行如下处理：

# Set kernel (structuring element) size:
kernelSize = 3

# Set operation iterations:
opIterations = 3

# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))

# Perform closing:
closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)

# Invert image to obtain black numbers on white background:
closingImage = 255 - closingImage

使用 cv2 / pytesseract 进行数字识别的局部对比度增强

问题描述

1 个解决方案

解决方案1
3 已采纳 2021-01-11 22:49:19

使用 cv2 / pytesseract 进行数字识别的局部对比度增强

问题描述

1 个解决方案

解决方案1 3 已采纳 2021-01-11 22:49:19

解决方案1
3 已采纳 2021-01-11 22:49:19