为 OCR 拆分多列图像

Question

I'm trying to crop both columns from several pages like this in order to later OCR, looking at splitting the page along the vertical line我正在尝试从这样的几个页面中裁剪两列，以便稍后进行 OCR，查看沿垂直线拆分页面

What I've got so far is finding the header, so that it can be cropped out:到目前为止，我得到的是找到 header，以便将其裁剪掉：

image = cv2.imread('014-page1.jpg')
im_h, im_w, im_d = image.shape
base_image = image.copy()

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Create rectangular structuring element and dilate
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (50,10))
dilate = cv2.dilate(thresh, kernel, iterations=1)

# Find contours and draw rectangle
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=lambda x: cv2.boundingRect(x)[1])
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    if h < 20 and w > 250:
        cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)

How could I split the page vertically, and grab the text in sequence from the columns?如何垂直拆分页面，并从列中按顺序获取文本？ Or alternatively, is there a better way to go about this?或者，有没有更好的方法来 go 关于这个？

Answer 1

Here's my take on the problem.这是我对这个问题的看法。 It involves selecting a middle portion of the image , assuming the vertical line is present through all the image (or at least passes through the middle of the page).它涉及选择图像的中间部分，假设垂直线贯穿所有图像（或至少穿过页面中间）。 I process this Region of Interest (ROI) and then reduce it to a row.我处理这个感兴趣区域 (ROI) ，然后将其reduce为一行。 Then, I get the starting and ending horizontal coordinates of the crop.然后，我得到作物的开始和结束水平坐标。 With this information and then produce the final cropped images .有了这些信息再产生最终的裁剪图像。

I tried to made the algorithm general.我试图使算法通用。 It can split all the columns if you have more than two columns in the original image.如果原始图像中有两列以上，它可以拆分所有列。 Let's check out the code:让我们看看代码：

# Imports:
import numpy as np
import cv2

# Image path
path = "D://opencvImages//"
fileName = "pmALU.jpg"

# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)

# To grayscale:
grayImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)

# Otsu Threshold:
_, binaryImage = cv2.threshold(grayImage, 0, 255, cv2.THRESH_OTSU)

# Get image dimensions:
(imageHeight, imageWidth) = binaryImage.shape[:2]

# Set middle ROI dimensions:
middleVertical = 0.5 * imageHeight
roiWidth = imageWidth
roiHeight = int(0.1 * imageHeight)
middleRoiVertical = 0.5 * roiHeight
roiY = int(0.5 * imageHeight - middleRoiVertical)

The first portion of the code gets the ROI .代码的第一部分获取ROI 。 I set it to crop around the middle of the image.我将其设置为在图像中间裁剪。 Let's just visualize the ROI that will be used for processing:让我们可视化将用于处理的ROI ：

The next step is to crop this:下一步是裁剪这个：

# Slice the ROI:
middleRoi = binaryImage[roiY:roiY + roiHeight, 0:imageWidth]
showImage("middleRoi", middleRoi)
writeImage(path+"middleRoi", middleRoi)

This produces the following crop:这会产生以下作物：

Alright.好吧。 The idea is to reduce this image to one row.这个想法是将此图像减少到一行。 If I get the maximum value of all columns and store them in one row, I should get a big white portion where the vertical line passes through.如果我得到所有列的最大值并将它们存储在一行中，我应该在垂直线穿过的地方得到一个很大的白色部分。

Now, there's a problem here.现在，这里有一个问题。 If I directly reduce this image, this would be the result (the following is an image of the reduced row ):如果我直接缩小这个图像，这将是结果（以下是缩小行的图像）：

The image is a little bit small, but you can see the row produces two black columns at the sides, followed by two white blobs.图像有点小，但您可以看到该行在两侧产生了两个黑色列，然后是两个白色斑点。 That's because the image has been scanned, additionally the text seems to be justified and some margins are produced at the sides.那是因为图像已被扫描，另外文本似乎是合理的，并且在侧面产生了一些边距。 I only need the central white blob with everything else in black .我只需要中央的白色斑点，其他一切都是黑色的。

I can solve this in two steps: draw a white rectangle around the image before reducing it - this will take care of the black columns.我可以分两步解决这个问题：在缩小图像之前在图像周围绘制一个白色矩形 - 这将处理黑色列。 After this, I can Flood-filling with black again at both sides of the reduced image:在此之后，我可以在缩小图像的两侧再次用黑色Flood-filling ：

# White rectangle around ROI:
rectangleThickness = int(0.01 * imageHeight)
cv2.rectangle(middleRoi, (0, 0), (roiWidth, roiHeight), 255, rectangleThickness)

# Image reduction to a row:
reducedImage = cv2.reduce(middleRoi, 0, cv2.REDUCE_MIN)

# Flood fill at the extreme corners:
fillPositions = [0, imageWidth - 1]

for i in range(len(fillPositions)):
    # Get flood-fill coordinate:
    x = fillPositions[i]
    currentCorner = (x, 0)
    fillColor = 0
    cv2.floodFill(reducedImage, None, currentCorner, fillColor)

Now, the reduced image looks like this:现在，缩小后的图像如下所示：

Nice.好的。 But there's another problem.但还有另一个问题。 The central black line produced a "gap" at the center of the row.中央黑线在行的中心产生了一个“间隙”。 Not a problem really, because I can fill that gap with an opening :真的不是问题，因为我可以用一个opening来填补这个空白：

# Apply Opening:
kernel = np.ones((3, 3), np.uint8)
reducedImage = cv2.morphologyEx(reducedImage, cv2.MORPH_CLOSE, kernel, iterations=2)

This is the result.这就是结果。 No more central gap:没有更多的中心间隙：

Cool.凉爽的。 Let's get the vertical positions (indices) where the transitions from black to white and vice versa occur, starting at 0 :让我们从0开始获取从黑色到白色的转换发生的垂直位置（索引），反之亦然：

# Get horizontal transitions:
whiteSpaces = np.where(np.diff(reducedImage, prepend=np.nan))[1]

I now know where to crop.我现在知道在哪里种植。 Let's see:让我们来看看：

# Crop the image:
colWidth = len(whiteSpaces)
spaceMargin = 0

for x in range(0, colWidth, 2):

    # Get horizontal cropping coordinates:
    if x != colWidth - 1:
        x2 = whiteSpaces[x + 1]
        spaceMargin = (whiteSpaces[x + 2] - whiteSpaces[x + 1]) // 2
    else:
        x2 = imageWidth

    # Set horizontal cropping coordinates:
    x1 = whiteSpaces[x] - spaceMargin
    x2 = x2 + spaceMargin

    # Clamp and Crop original input:
    x1 = clamp(x1, 0, imageWidth)
    x2 = clamp(x2, 0, imageWidth)

    currentCrop = inputImage[0:imageHeight, x1:x2]
    cv2.imshow("currentCrop", currentCrop)
    cv2.waitKey(0)

You'll note I calculate a margin .你会注意到我计算了一个margin 。 This is to crop the margins of the columns.这是为了裁剪列的边缘。 I also use a clamp function to make sure the horizontal cropping points are always within image dimensions.我还使用clamp function 来确保水平裁剪点始终在图像尺寸范围内。 This is the definition of that function:这是 function 的定义：

# Clamps an integer to a valid range:
def clamp(val, minval, maxval):
    if val < minval: return minval
    if val > maxval: return maxval
    return val

These are the results (resized for the post, open them in a new tab to see the full image) :这些是结果（为帖子调整大小，在新选项卡中打开它们以查看完整图像） ：

Let's check out how this scales to more than two columns.让我们看看它是如何扩展到两列以上的。 This is a modification of the original input, with more columns added manually, just to check out the results:这是对原始输入的修改，手动添加了更多列，只是为了检查结果：

These are the four images produced:这些是生成的四个图像：

Answer 2

In order to separate out the two columns you have to find the dividing line in the center.为了分离出两列，您必须在中心找到分界线。

You can use Sobel derivative filter in the x-axis to find the black vertical line.您可以在x-axis使用Sobel derivative filter来找到黑色垂直线。 Follow this tutorial for more details on the Sobel filter operator.按照本教程了解有关 Sobel 过滤器运算符的更多详细信息。

sobel_vertical = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=3) # (1,0) for x direction derivatives

Extract the line position by thresholding the sobel result:通过对sobel结果进行阈值化，提取行 position：

ret, sobel_thresh = cv.threshold(sobel_vertical,127,255,cv.THRESH_BINARY)

Then scanning the center columns for a column with high concentration of white values.然后扫描中心列以查找具有高浓度白色值的列。

One way to do this would be to do a column-wise sum and then find the column with the maximum values.一种方法是按column-wise求和，然后找到具有最大值的列。 But there are other ways to do it.但是还有其他方法可以做到这一点。


sum_cols = np.add.reduce(sobel_thresh, axis = 1)
max_col = np.argmax(sum_cols)

In a case where there is no black dividing line you can skip the sobel .在没有黑色分界线的情况下，您可以跳过sobel 。 Just resize aggressively and search for the columns in the center with high concentration of white pixels.只需积极resize并搜索中心具有高浓度白色像素的列。

为 OCR 拆分多列图像

问题描述

2 个解决方案

解决方案1
4 已采纳 2022-05-20 00:31:47

解决方案2
2 2022-05-19 19:27:45

为 OCR 拆分多列图像

问题描述

2 个解决方案

解决方案1 4 已采纳 2022-05-20 00:31:47

解决方案2 2 2022-05-19 19:27:45

解决方案1
4 已采纳 2022-05-20 00:31:47

解决方案2
2 2022-05-19 19:27:45