简体   繁体   English

如何检测文本是旋转 180 度还是倒置

[英]How to detect if text is rotated 180 degrees or flipped upside down

I am working on a text recognition project.我正在做一个文本识别项目。 There is a chance the text is rotated 180 degrees.文本有可能旋转 180 度。 I have tried tesseract-ocr on terminal, but no luck.我在终端上尝试过 tesseract-ocr,但没有运气。 Is there any way to detect it and correct it?有没有办法检测它并纠正它? An example of the text is shown below.文本示例如下所示。

在此处输入图片说明

tesseract input.png output

tesseract input.png - --psm 0 -c min_characters_to_try=10 tesseract input.png - --psm 0 -c min_characters_to_try=10

Warning. Invalid resolution 0 dpi. Using 70 instead.
Page number: 0
Orientation in degrees: 180
Rotate: 180
Orientation confidence: 0.74
Script: Latin
Script confidence: 1.67

One simple approach to detect if text is rotated 180 degrees is to use the observation that text tends to be skewed towards the bottom.检测文本是否旋转 180 度的一种简单方法是使用文本倾向于向底部倾斜的观察。 Here's the strategy:这是策略:

  • Convert image to grayscale将图像转换为灰度
  • Gaussian blur高斯模糊
  • Threshold image阈值图像
  • Find the top/bottom half ROIs of thresholded image找到阈值图像的上半部分/下半部分 ROI
  • Count non-zero array elements for each half计算每一半的非零数组元素

Threshold image阈值图像

在此处输入图片说明

Find ROIs of top and bottom half查找上半部分和下半部分的投资回报率

在此处输入图片说明

在此处输入图片说明

Next we split the top/bottom sections接下来我们拆分顶部/底部部分

在此处输入图片说明

With each half we count non-zero array elements using cv2.countNonZero() .对于每一半,我们使用cv2.countNonZero()计算非零数组元素。 We get this我们得到这个

('top', 4035)
('bottom', 3389)

By comparing the values between the two halves, if the top half has more pixels than the bottom half, it is upside down by 180 degrees.通过比较两半之间的值,如果上半部分的像素比下半部分多,则上下颠倒 180 度。 If it has less, it is correctly oriented.如果它更少,则它的方向是正确的。

Now that we have detected if it is upside down, we can rotate it using this function现在我们已经检测到它是否颠倒了,我们可以使用这个函数旋转它

def rotate(image, angle):
    # Obtain the dimensions of the image
    (height, width) = image.shape[:2]
    (cX, cY) = (width / 2, height / 2)

    # Grab the rotation components of the matrix
    matrix = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
    cos = np.abs(matrix[0, 0])
    sin = np.abs(matrix[0, 1])

    # Find the new bounding dimensions of the image
    new_width = int((height * sin) + (width * cos))
    new_height = int((height * cos) + (width * sin))

    # Adjust the rotation matrix to take into account translation
    matrix[0, 2] += (new_width / 2) - cX
    matrix[1, 2] += (new_height / 2) - cY

    # Perform the actual rotation and return the image
    return cv2.warpAffine(image, matrix, (new_width, new_height))

Rotating the image旋转图像

rotated = rotate(original_image, 180)
cv2.imshow("rotated", rotated)

which gives us the correct result这给了我们正确的结果

在此处输入图片说明

This is the pixel result if the image was correctly oriented如果图像方向正确,这是像素结果

('top', 3209)
('bottom', 4206)

Full code完整代码

import numpy as np
import cv2

def rotate(image, angle):
    # Obtain the dimensions of the image
    (height, width) = image.shape[:2]
    (cX, cY) = (width / 2, height / 2)

    # Grab the rotation components of the matrix
    matrix = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
    cos = np.abs(matrix[0, 0])
    sin = np.abs(matrix[0, 1])

    # Find the new bounding dimensions of the image
    new_width = int((height * sin) + (width * cos))
    new_height = int((height * cos) + (width * sin))

    # Adjust the rotation matrix to take into account translation
    matrix[0, 2] += (new_width / 2) - cX
    matrix[1, 2] += (new_height / 2) - cY

    # Perform the actual rotation and return the image
    return cv2.warpAffine(image, matrix, (new_width, new_height))

image = cv2.imread("1.PNG")
original_image = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blurred, 110, 255, cv2.THRESH_BINARY_INV)[1]
cv2.imshow("thresh", thresh)

x, y, w, h = 0, 0, image.shape[1], image.shape[0]

top_half = ((x,y), (x+w, y+h/2))
bottom_half = ((x,y+h/2), (x+w, y+h))

top_x1,top_y1 = top_half[0]
top_x2,top_y2 = top_half[1]
bottom_x1,bottom_y1 = bottom_half[0]
bottom_x2,bottom_y2 = bottom_half[1]

# Split into top/bottom ROIs
top_image = thresh[top_y1:top_y2, top_x1:top_x2]
bottom_image = thresh[bottom_y1:bottom_y2, bottom_x1:bottom_x2]

cv2.imshow("top_image", top_image)
cv2.imshow("bottom_image", bottom_image)

# Count non-zero array elements
top_pixels = cv2.countNonZero(top_image)
bottom_pixels = cv2.countNonZero(bottom_image)

print('top', top_pixels)
print('bottom', bottom_pixels)

# Rotate if upside down
if top_pixels > bottom_pixels:
    rotated = rotate(original_image, 180)
    cv2.imshow("rotated", rotated)

cv2.waitKey(0)

I kind of liked the pytessaract solution.我有点喜欢pytessaract解决方案。

import cv2 
import pytesseract
from scipy.ndimage import rotate as Rotate 

def float_convertor(x):
    if x.isdigit():
        out= float(x)
    else:
        out= x
    return out 

def tesseract_find_rotatation(img: str):
    img = cv2.imread(img) if isinstance(img, str) else img
    k = pytesseract.image_to_osd(img)
    out = {i.split(":")[0]: float_convertor(i.split(":")[-1].strip()) for i in k.rstrip().split("\n")}
    img_rotated = Rotate(img, 360-out["Rotate"])
    return img_rotated, out

usage用法

img_loc = ""
img_rotated, out = tessaract_find_rotation(img_loc)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM