[英]Preprocessing an image for MNIST OCR
我正忙於在 python 中使用 OCR 應用程序來讀取數字。 我正在使用 OpenCV 查找圖像上的輪廓,對其進行裁剪,然后將圖像預處理為 28x28 以用於 MNIST 數據集。 我的圖像不是方形的,所以當我調整圖像大小時,我似乎失去了很多質量。 我可以嘗試任何提示或建議嗎?
我已經嘗試了來自http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html 的一些技巧,比如擴張和開放。 但這並沒有讓它變得更好,它只會讓它變得模糊......
這是我使用的代碼(找到輪廓,裁剪它,調整它的大小,然后是閾值,然后我將它居中)
import numpy as np
import cv2
import imutils
import scipy
from imutils.perspective import four_point_transform
from scipy import ndimage
images = np.zeros((4, 784))
correct_vals = np.zeros((4, 10))
i = 0
def getBestShift(img):
cy, cx = ndimage.measurements.center_of_mass(img)
rows, cols = img.shape
shiftx = np.round(cols / 2.0 - cx).astype(int)
shifty = np.round(rows / 2.0 - cy).astype(int)
return shiftx, shifty
def shift(img, sx, sy):
rows, cols = img.shape
M = np.float32([[1, 0, sx], [0, 1, sy]])
shifted = cv2.warpAffine(img, M, (cols, rows))
return shifted
for no in [1, 3, 4, 5]:
image = cv2.imread("images/" + str(no) + ".jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(blurred, 50, 200, 255)
cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if imutils.is_cv2() else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
displayCnt = None
for c in cnts:
# approximate the contour
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.02 * peri, True)
# if the contour has four vertices, then we have found
# the thermostat display
if len(approx) == 4:
displayCnt = approx
break
warped = four_point_transform(gray, displayCnt.reshape(4, 2))
gray = cv2.resize(255 - warped, (28, 28))
(thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
while np.sum(gray[0]) == 0:
gray = gray[1:]
while np.sum(gray[:, 0]) == 0:
gray = np.delete(gray, 0, 1)
while np.sum(gray[-1]) == 0:
gray = gray[:-1]
while np.sum(gray[:, -1]) == 0:
gray = np.delete(gray, -1, 1)
rows, cols = gray.shape
if rows > cols:
factor = 20.0 / rows
rows = 20
cols = int(round(cols * factor))
gray = cv2.resize(gray, (cols, rows))
else:
factor = 20.0 / cols
cols = 20
rows = int(round(rows * factor))
gray = cv2.resize(gray, (cols, rows))
colsPadding = (int(np.math.ceil((28 - cols) / 2.0)), int(np.math.floor((28 - cols) / 2.0)))
rowsPadding = (int(np.math.ceil((28 - rows) / 2.0)), int(np.math.floor((28 - rows) / 2.0)))
gray = np.lib.pad(gray, (rowsPadding, colsPadding), 'constant')
shiftx, shifty = getBestShift(gray)
shifted = shift(gray, shiftx, shifty)
gray = shifted
cv2.imwrite("processed/" + str(no) + ".png", gray)
cv2.imshow("imgs", gray)
cv2.waitKey(0)
調整圖像大小時,請確保選擇最適合您需要的插值。 為此,我建議:
gray = cv2.resize(255 - warped, (28, 28), interpolation=cv2.INTER_AREA)
您可以在此處查看方法的比較: http : //tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/但由於只有少數方法,您可以全部嘗試一下,看看什么給出了最佳結果。 看起來默認值是 INTER_LINEAR。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.