使用 cv2 / pytesseract 進行數字識別的局部對比度增強

Question

我想使用 pytesseract 從圖像中讀取數字。 圖像如下所示：

數字是點綴的，為了能夠使用 pytesseract，我需要白色背景上的黑色連接數字。 為此，我考慮使用erode和dilate作為預處理技術。 如您所見，圖像相似，但在某些方面卻大不相同。 例如，第一張圖像中的點比背景暗，而第二張圖像中的點更白。 這意味着，在第一張圖像中，我可以使用 erode 獲得黑色連接線，在第二張圖像中，我可以使用 dilate 獲得白色連接線，然后反轉 colors。 這導致以下結果：

使用適當的閾值，可以使用 pytesseract 輕松讀取第一張圖像。 第二個圖像，無論是誰，都比較棘手。 問題是，例如“4”的一部分比三個周圍的背景更暗。 所以一個簡單的門檻是行不通的。 我需要局部閾值或局部對比度增強之類的東西。 有人在這里有想法嗎？

編輯：

OTSU、平均閾值和高斯閾值導致以下結果：

Answer 1

您的圖像分辨率很低，但您可以嘗試一種稱為增益除法的方法。 這個想法是您嘗試構建背景的 model，然后通過該 model 對每個輸入像素進行加權。 output 增益在大部分圖像中應該是相對恆定的。

執行增益划分后，您可以嘗試通過應用區域濾波器和形態學來改善圖像。 我只嘗試了您的第一張圖片，因為它是“最差的”。

這些是獲得增益分割圖像的步驟：

應用軟中值模糊過濾器以消除高頻噪聲。
通過局部最大值獲取后台的model。 應用一個非常強大的close操作，具有一個大的structuring element （我使用的是大小為15的矩形 kernel ）。
通過在每個局部最大像素之間除以255來執行增益調整。 用每個輸入圖像像素加權這個值。
你應該得到一個很好的圖像，其中背景照明非常標准化， threshold這個圖像以獲得字符的二進制掩碼。

現在，您可以通過以下附加步驟來提高圖像質量：

Threshold通過大津，但添加了一點點偏差。 （不幸的是，這是一個手動步驟，具體取決於輸入）。
應用區域過濾器以過濾掉較小的噪聲斑點。

讓我們看看代碼：

import numpy as np
import cv2

# image path
path = "C:/opencvImages/"
fileName = "iA904.png"

# Reading an image in default mode:
inputImage = cv2.imread(path+fileName)

# Remove small noise via median:
filterSize = 5
imageMedian = cv2.medianBlur(inputImage, filterSize)

# Get local maximum:
kernelSize = 15
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(imageMedian, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)

# Perform gain division
gainDivision = np.where(localMax == 0, 0, (inputImage/localMax))

# Clip the values to [0,255]
gainDivision = np.clip((255 * gainDivision), 0, 255)

# Convert the mat type from float to uint8:
gainDivision = gainDivision.astype("uint8") 

# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(gainDivision, cv2.COLOR_BGR2GRAY)

這就是增益划分為您帶來的：

請注意，照明更加平衡。 現在，讓我們應用一點對比度增強：

# Contrast Enhancement:
grayscaleImage = np.uint8(cv2.normalize(grayscaleImage, grayscaleImage, 0, 255, cv2.NORM_MINMAX))

你得到了這個，它在前景和背景之間產生了更多的對比：

現在，讓我們嘗試對該圖像進行閾值處理以獲得一個漂亮的二進制掩碼。 正如我建議的那樣，嘗試 Otsu 的閾值處理，但在結果中添加（或減去）一點偏差。 如前所述，此步驟取決於您輸入的質量：

# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)

threshValue = 0.9 * threshValue
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)

你最終得到這個二進制掩碼：

反轉它並過濾掉小斑點。 我將area閾值設置為10像素：

# Invert image:
binaryImage = 255 - binaryImage

# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(binaryImage, connectivity=4)

# Set the minimum pixels for the area filter:
minArea = 10

# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]

# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype("uint8")

這是最終的二進制掩碼：

如果您打算將此圖像發送到OCR ，您可能需要先應用一些形態學。 也許是嘗試加入構成角色的點的closing 。 還要確保使用與您實際嘗試識別的字體接近的字體來訓練您的OCR分類器。 這是經過3次迭代的大小為3的rectangular closing操作后的（反轉的）掩碼：

編輯：

要得到最后一張圖像，對過濾后的 output 進行如下處理：

# Set kernel (structuring element) size:
kernelSize = 3

# Set operation iterations:
opIterations = 3

# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))

# Perform closing:
closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)

# Invert image to obtain black numbers on white background:
closingImage = 255 - closingImage

使用 cv2 / pytesseract 進行數字識別的局部對比度增強

問題描述

1 個解決方案

解決方案1
3 已采納 2021-01-11 22:49:19

使用 cv2 / pytesseract 進行數字識別的局部對比度增強

問題描述

1 個解決方案

解決方案1 3 已采納 2021-01-11 22:49:19

解決方案1
3 已采納 2021-01-11 22:49:19