檢測 OCR 文本圖像是否上下顛倒

Question

我有數百張圖像（掃描文檔），其中大部分都是傾斜的。 我想用 Python 去歪斜它們。
這是我使用的代碼：

import numpy as np
import cv2

from skimage.transform import radon


filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
    I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I)  # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))

# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)

此代碼適用於大多數文檔，除了某些角度：（180 和 0）和（90 和 270）通常被檢測為相同的角度（即它在 (180 和 0) 和 (90 和270))。 所以我得到了很多顛倒的文件。

這是一個例子：

我得到的結果圖像與輸入圖像相同。

是否有任何建議可以使用 Opencv 和 Python 檢測圖像是否顛倒？
PS：我嘗試使用 EXIF 數據檢查方向，但沒有找到任何解決方案。

編輯：
可以使用 Tesseract（Python 的 pytesseract）檢測方向，但只有在圖像包含大量字符時才有可能。
對於任何可能需要這個的人：

import cv2
import pytesseract


print(pytesseract.image_to_osd(cv2.imread(file_name)))

如果文檔包含足夠的字符，則 Tesseract 可以檢測到方向。 但是，當圖像的線條較少時，Tesseract 建議的方位角通常是錯誤的。 所以這不可能是 100% 的解決方案。

Answer 1

用於對齊掃描文檔的Python3/OpenCV4 腳本。

旋轉文檔並對行求和。 當文檔有 0 度和 180 度旋轉時，圖像中會有很多黑色像素：

使用記分方法。 為每張圖片與斑馬圖案的相似度打分。 得分最高的圖像具有正確的旋轉。 您鏈接到的圖像偏離了 0.5 度。 為了便於閱讀，我省略了一些函數，完整的代碼可以在這里找到。

# Rotate the image around in a circle
angle = 0
while angle <= 360:
    # Rotate the source image
    img = rotate(src, angle)    
    # Crop the center 1/3rd of the image (roi is filled with text)
    h,w = img.shape
    buffer = min(h, w) - int(min(h,w)/1.15)
    roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
    # Create background to draw transform on
    bg = np.zeros((buffer*2, buffer*2), np.uint8)
    # Compute the sums of the rows
    row_sums = sum_rows(roi)
    # High score --> Zebra stripes
    score = np.count_nonzero(row_sums)
    scores.append(score)
    # Image has best rotation
    if score <= min(scores):
        # Save the rotatied image
        print('found optimal rotation')
        best_rotation = img.copy()
    k = display_data(roi, row_sums, buffer)
    if k == 27: break
    # Increment angle and try again
    angle += .75
cv2.destroyAllWindows()

如何判斷文件是否倒置？ 填充從文檔頂部到圖像中第一個非黑色像素的區域。 用黃色測量面積。 面積最小的圖像將是正面朝上的圖像：

# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()

Answer 2

假設您確實已經在圖像上運行了角度校正，您可以嘗試以下操作來確定它是否被翻轉：

將校正后的圖像投影到 y 軸，這樣您就可以得到每條線的“峰值”。 重要提示：實際上幾乎總是有兩個子峰！
通過與高斯卷積來平滑此投影，以消除精細結構、噪聲等。
對於每個峰，檢查較強的子峰是在頂部還是在底部。
計算底部有子峰的峰比例。 這是您的標量值，可讓您確信圖像方向正確。

步驟 3 中的峰值查找是通過查找具有高於平均值的部分來完成的。 然后通過 argmax 找到子峰。

這是一個說明該方法的圖； 幾行你的例子圖片

藍色：原始投影
橙色：平滑投影
水平線：整個圖像的平滑投影的平均值。

這是一些執行此操作的代碼：

import cv2
import numpy as np

# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]

# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)

# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]

# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
    line = smooth[start:end]
    if np.argmax(line) < len(line)/2:
        lower_peaks += 1

print(lower_peaks / len(line_starts))

這為給定的圖像打印 0.125，因此方向不正確，必須翻轉。

請注意，如果圖像中存在圖像或任何未按行組織的內容（可能是數學或圖片），則此方法可能會嚴重失效。 另一個問題是行數太少，導致統計數據不佳。

不同的字體也可能導致不同的分布。 您可以在幾張圖像上嘗試此方法，看看該方法是否有效。 我沒有足夠的數據。

Answer 3

您可以使用Alyn模塊。 要安裝它：

pip install alyn

然后用它來校正圖像（取自主頁）：

from alyn import Deskew
d = Deskew(
    input_file='path_to_file',
    display_image='preview the image on screen',
    output_file='path_for_deskewed image',
    r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()

請注意， Alyn僅用於校正文本。

Answer 4

我有幾百張圖像（掃描的文檔），其中大多數是歪斜的。 我想使用Python使它們偏斜。
這是我使用的代碼：

import numpy as np
import cv2

from skimage.transform import radon


filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
    I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I)  # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))

# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)

該代碼對大多數文檔都適用，除了某些角度：（180和0）和（90和270）通常被檢測為相同的角度（即，它不會使（180和0）與（90和90之間產生差異）。 270））。 所以我得到了很多顛倒的文件。

這是一個例子：

我得到的結果圖像與輸入圖像相同。

有沒有建議使用Opencv和Python檢測圖像是否顛倒了？
PS：我嘗試使用EXIF數據檢查方向，但沒有找到任何解決方案。

編輯：
可以使用Tesseract（Python的pytesseract）檢測方向，但是只有當圖像包含很多字符時才有可能。
對於可能需要此功能的任何人：

import cv2
import pytesseract


print(pytesseract.image_to_osd(cv2.imread(file_name)))

如果文檔包含足夠的字符，則Tesseract可以檢測方向。 但是，當圖像的線條很少時，Tesseract建議的定向角度通常是錯誤的。 因此，這不是100％的解決方案。

檢測 OCR 文本圖像是否上下顛倒

問題描述

3 個解決方案

解決方案1
30 已采納 2019-04-17 22:09:17

解決方案2
7 2019-04-17 21:42:32

解決方案3
1 2019-04-17 15:06:40

解決方案4
-4 2019-11-19 11:17:13

檢測 OCR 文本圖像是否上下顛倒

問題描述

3 個解決方案

解決方案1 30 已采納 2019-04-17 22:09:17

解決方案2 7 2019-04-17 21:42:32

解決方案3 1 2019-04-17 15:06:40

解決方案4 -4 2019-11-19 11:17:13

解決方案1
30 已采納 2019-04-17 22:09:17

解決方案2
7 2019-04-17 21:42:32

解決方案3
1 2019-04-17 15:06:40

解決方案4
-4 2019-11-19 11:17:13