[英]How to detect if text is rotated 180 degrees or flipped upside down
I am working on a text recognition project.我正在做一个文本识别项目。 There is a chance the text is rotated 180 degrees.
文本有可能旋转 180 度。 I have tried tesseract-ocr on terminal, but no luck.
我在终端上尝试过 tesseract-ocr,但没有运气。 Is there any way to detect it and correct it?
有没有办法检测它并纠正它? An example of the text is shown below.
文本示例如下所示。
tesseract input.png output
tesseract input.png - --psm 0 -c min_characters_to_try=10 tesseract input.png - --psm 0 -c min_characters_to_try=10
Warning. Invalid resolution 0 dpi. Using 70 instead.
Page number: 0
Orientation in degrees: 180
Rotate: 180
Orientation confidence: 0.74
Script: Latin
Script confidence: 1.67
One simple approach to detect if text is rotated 180 degrees is to use the observation that text tends to be skewed towards the bottom.检测文本是否旋转 180 度的一种简单方法是使用文本倾向于向底部倾斜的观察。 Here's the strategy:
这是策略:
Threshold image阈值图像
Find ROIs of top and bottom half查找上半部分和下半部分的投资回报率
Next we split the top/bottom sections接下来我们拆分顶部/底部部分
With each half we count non-zero array elements using cv2.countNonZero()
.对于每一半,我们使用
cv2.countNonZero()
计算非零数组元素。 We get this我们得到这个
('top', 4035)
('bottom', 3389)
By comparing the values between the two halves, if the top half has more pixels than the bottom half, it is upside down by 180 degrees.通过比较两半之间的值,如果上半部分的像素比下半部分多,则上下颠倒 180 度。 If it has less, it is correctly oriented.
如果它更少,则它的方向是正确的。
Now that we have detected if it is upside down, we can rotate it using this function现在我们已经检测到它是否颠倒了,我们可以使用这个函数旋转它
def rotate(image, angle):
# Obtain the dimensions of the image
(height, width) = image.shape[:2]
(cX, cY) = (width / 2, height / 2)
# Grab the rotation components of the matrix
matrix = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
cos = np.abs(matrix[0, 0])
sin = np.abs(matrix[0, 1])
# Find the new bounding dimensions of the image
new_width = int((height * sin) + (width * cos))
new_height = int((height * cos) + (width * sin))
# Adjust the rotation matrix to take into account translation
matrix[0, 2] += (new_width / 2) - cX
matrix[1, 2] += (new_height / 2) - cY
# Perform the actual rotation and return the image
return cv2.warpAffine(image, matrix, (new_width, new_height))
Rotating the image旋转图像
rotated = rotate(original_image, 180)
cv2.imshow("rotated", rotated)
which gives us the correct result这给了我们正确的结果
This is the pixel result if the image was correctly oriented如果图像方向正确,这是像素结果
('top', 3209)
('bottom', 4206)
Full code完整代码
import numpy as np
import cv2
def rotate(image, angle):
# Obtain the dimensions of the image
(height, width) = image.shape[:2]
(cX, cY) = (width / 2, height / 2)
# Grab the rotation components of the matrix
matrix = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
cos = np.abs(matrix[0, 0])
sin = np.abs(matrix[0, 1])
# Find the new bounding dimensions of the image
new_width = int((height * sin) + (width * cos))
new_height = int((height * cos) + (width * sin))
# Adjust the rotation matrix to take into account translation
matrix[0, 2] += (new_width / 2) - cX
matrix[1, 2] += (new_height / 2) - cY
# Perform the actual rotation and return the image
return cv2.warpAffine(image, matrix, (new_width, new_height))
image = cv2.imread("1.PNG")
original_image = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blurred, 110, 255, cv2.THRESH_BINARY_INV)[1]
cv2.imshow("thresh", thresh)
x, y, w, h = 0, 0, image.shape[1], image.shape[0]
top_half = ((x,y), (x+w, y+h/2))
bottom_half = ((x,y+h/2), (x+w, y+h))
top_x1,top_y1 = top_half[0]
top_x2,top_y2 = top_half[1]
bottom_x1,bottom_y1 = bottom_half[0]
bottom_x2,bottom_y2 = bottom_half[1]
# Split into top/bottom ROIs
top_image = thresh[top_y1:top_y2, top_x1:top_x2]
bottom_image = thresh[bottom_y1:bottom_y2, bottom_x1:bottom_x2]
cv2.imshow("top_image", top_image)
cv2.imshow("bottom_image", bottom_image)
# Count non-zero array elements
top_pixels = cv2.countNonZero(top_image)
bottom_pixels = cv2.countNonZero(bottom_image)
print('top', top_pixels)
print('bottom', bottom_pixels)
# Rotate if upside down
if top_pixels > bottom_pixels:
rotated = rotate(original_image, 180)
cv2.imshow("rotated", rotated)
cv2.waitKey(0)
I kind of liked the pytessaract
solution.我有点喜欢
pytessaract
解决方案。
import cv2
import pytesseract
from scipy.ndimage import rotate as Rotate
def float_convertor(x):
if x.isdigit():
out= float(x)
else:
out= x
return out
def tesseract_find_rotatation(img: str):
img = cv2.imread(img) if isinstance(img, str) else img
k = pytesseract.image_to_osd(img)
out = {i.split(":")[0]: float_convertor(i.split(":")[-1].strip()) for i in k.rstrip().split("\n")}
img_rotated = Rotate(img, 360-out["Rotate"])
return img_rotated, out
img_loc = ""
img_rotated, out = tessaract_find_rotation(img_loc)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.