Currently, I am working on an OCR project where I need to read the text off of a label (see example images below). I am running into issues with the image skew and I need help fixing the image skew so the text is horizontal and not at an angle. Currently the process I am using attempts to score different angles from a given range (code included below), but this method is inconsistent and sometimes overcorrects an image skew or flat out fails to identify the skew and correct it. Just as a note, before the skew correction I am rotating all of the images by 270 degrees to get the text upright, then I am passing the image through the code below. The image passed through to the function is already a binary image.
Code:
def findScore(img, angle):
"""
Generates a score for the binary image recieved dependent on the determined angle.\n
Vars:\n
- array <- numpy array of the label\n
- angle <- predicted angle at which the image is rotated by\n
Returns:\n
- histogram of the image
- score of potential angle
"""
data = inter.rotate(img, angle, reshape = False, order = 0)
hist = np.sum(data, axis = 1)
score = np.sum((hist[1:] - hist[:-1]) ** 2)
return hist, score
def skewCorrect(img):
"""
Takes in a nparray and determines the skew angle of the text, then corrects the skew and returns the corrected image.\n
Vars:\n
- img <- numpy array of the label\n
Returns:\n
- Corrected image as a numpy array\n
"""
#Crops down the skewImg to determine the skew angle
img = cv2.resize(img, (0, 0), fx = 0.75, fy = 0.75)
delta = 1
limit = 45
angles = np.arange(-limit, limit+delta, delta)
scores = []
for angle in angles:
hist, score = findScore(img, angle)
scores.append(score)
bestScore = max(scores)
bestAngle = angles[scores.index(bestScore)]
rotated = inter.rotate(img, bestAngle, reshape = False, order = 0)
print("[INFO] angle: {:.3f}".format(bestAngle))
#cv2.imshow("Original", img)
#cv2.imshow("Rotated", rotated)
#cv2.waitKey(0)
#Return img
return rotated
Example images of the label before correction and after
If anyone can help me figure this problem out, it would be of much help.
Here's an implementation of the Projection Profile Method to determine skew. After obtaining a binary image, the idea is rotate the image at various angles and generate a histogram of pixels in each iteration. To determine the skew angle, we compare the maximum difference between peaks and using this skew angle, rotate the image to correct the skew
Left (original), Right (corrected)
import cv2
import numpy as np
from scipy.ndimage import interpolation as inter
def correct_skew(image, delta=1, limit=5):
def determine_score(arr, angle):
data = inter.rotate(arr, angle, reshape=False, order=0)
histogram = np.sum(data, axis=1)
score = np.sum((histogram[1:] - histogram[:-1]) ** 2)
return histogram, score
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
scores = []
angles = np.arange(-limit, limit + delta, delta)
for angle in angles:
histogram, score = determine_score(thresh, angle)
scores.append(score)
best_angle = angles[scores.index(max(scores))]
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, \
borderMode=cv2.BORDER_REPLICATE)
return best_angle, rotated
if __name__ == '__main__':
image = cv2.imread('1.png')
angle, rotated = correct_skew(image)
print(angle)
cv2.imshow('rotated', rotated)
cv2.imwrite('rotated.png', rotated)
cv2.waitKey()
ASSUMPTIONS:
SOLUTION:
hgt_rot_angle = cv2.minAreaRect(your_CLEAN_image_pixel_coordinates_to_enclose)[-1]
com_rot_angle = hgt_rot_angle + 90 if hgt_rot_angle < -45 else hgt_rot_angle
(h, w) = my_input_image.shape[0:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, com_rot_angle, 1.0)
corrected_image = cv2.warpAffine(your_ORIGINAL_image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
ORIGINAL SOURCE:
https://www.pyimagesearch.com/2017/02/20/text-skew-correction-opencv-python/ - a GREAT tutorial to get started (kudos to Adrian Rosebrock), BUT:
cv2.minAreaRect()
is not quite clear there and the code has the same variable for detection and for correction, which is even more confusing. I used the separate variables for clarity and my explanation of the first two lines of code is below.cv2.getRotationMatrix2D()
function, based on OpenCV documentation and based on my testing. More on this below as well.SOLUTION EXPLANATION:
The cv2.minAreaRect()
function returns the rotation angle value in the [-90, 0]
range as the last element of the tuple returned, and the angle value is tied to the HEIGHT value in the same returned tuple (it's located at cv2.minAreaRect()[1][1]
, to be precise, but we're not using it here).
Unless the angle of rotation is either -90.0
or 0.0
, the decision of what dimension is chosen as the "height" is not arbitrary - it always has to go from upper left to lower right, ie to have a negative slope.
What this means for our use case is that, depending on the width-height proportion of the content block and on its tilt, the "height" value returned by cv2.minAreaRect()
can be either the content block's logical height OR the width .
This means 2 things for us:
So, given (1) no assumptions about the content block's aspect ratio and (2) the assumed [-45:45]
range of the tilt, we can get the common tilt of the height and the width relative to the rectangular coordinate system (in the [-45:45]
range) by simply adding 90 degrees to the rotation value of the "height" if it falls below -45.0
.
Once we get this detected and calculated "common rotation angle" value, we can use it to fix the tilt by just passing the value directly to the cv2.getRotationMatrix2D()
function.
NOTE : the calculated existing "common rotation angle" is negative for the counter-clockwise tilt and positive for the clockwise tilt, which is a very common everyday convention. However, if we think of the angle
argument of cv2.getRotationMatrix2D()
as "the correction angle to apply" (which, I think, was the intent), then the sign convenion is the OPPOSITE . So we need to pass the detected and calculated "common rotation angle" value as-is if we want to see it counter-acted in the output image, which is supported by the many tests that I have performed.
This is a direct quote on the angle
parameter from OpenCV documentation :
Rotation angle in degrees. Positive values mean counter-clockwise rotation (the coordinate origin is assumed to be the top-left corner).
WHAT IF THE SINGLE RECTANGLE IS A POOR FIT?
The above solution works very well for densely populated full page scans, clean labels and things like that, but it does not work well at all for sparsely populated images, where the overall tightest fit is not a rectangle, ie when the 2nd starting assumption does not hold.
In the latter scenario the following may work IF most of the individual shapes in the input image can nicely fit into rectangles, or at least better than all of the content combined:
OTHER SOURCES:
https://www.pyimagesearch.com/2015/11/30/detecting-machine-readable-zones-in-passport-images/
https://docs.opencv.org/master/dd/d49/tutorial_py_contour_features.html
To add up to @nathancy answer, for windows users, if you're getting additional skew just add dtype=float
. Whenever you create a numpy array. There's a integer overflow issue with windows as it assigns int(32) bit as data type unlike rest of the systems.
See below code; added dtype=float
in np.sum()
methods:
import cv2
import numpy as np
from scipy.ndimage import interpolation as inter
def correct_skew(image, delta=1, limit=5):
def determine_score(arr, angle):
data = inter.rotate(arr, angle, reshape=False, order=0)
histogram = np.sum(data, axis=1, dtype=float)
score = np.sum((histogram[1:] - histogram[:-1]) ** 2, dtype=float)
return histogram, score
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
scores = []
angles = np.arange(-limit, limit + delta, delta)
for angle in angles:
histogram, score = determine_score(thresh, angle)
scores.append(score)
best_angle = angles[scores.index(max(scores))]
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, \
borderMode=cv2.BORDER_REPLICATE)
return best_angle, rotated
if __name__ == '__main__':
image = cv2.imread('1.png')
angle, rotated = correct_skew(image)
print(angle)
cv2.imshow('rotated', rotated)
cv2.imwrite('rotated.png', rotated)
cv2.waitKey()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.