简体   繁体   English

为 MNIST OCR 预处理图像

[英]Preprocessing an image for MNIST OCR

I'm busy with an OCR application in python to read digits.我正忙于在 python 中使用 OCR 应用程序来读取数字。 I'm using OpenCV to find the contours on an image, crop it, and then preprocess the image to 28x28 for the MNIST dataset.我正在使用 OpenCV 查找图像上的轮廓,对其进行裁剪,然后将图像预处理为 28x28 以用于 MNIST 数据集。 My images are not square, so I seem to lose a lot of quality when I resize the image.我的图像不是方形的,所以当我调整图像大小时,我似乎失去了很多质量。 Any tips or suggestions I could try?我可以尝试任何提示或建议吗?

This is the original image这是原图

This is after editing it这是编辑后的

And this is the quality it should be这就是它应有的品质

I've tried some tricks from http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html , like Dilation and Opening.我已经尝试了来自http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html 的一些技巧,比如扩张和开放。 But it doesnt make it better, it only makes it vague...但这并没有让它变得更好,它只会让它变得模糊......

This it the code im using (find contour,crop it, resize it, then threshold, and then i center it)这是我使用的代码(找到轮廓,裁剪它,调整它的大小,然后是阈值,然后我将它居中)

import numpy as np
import cv2
import imutils
import scipy
from imutils.perspective import four_point_transform
from scipy import ndimage

images = np.zeros((4, 784))
correct_vals = np.zeros((4, 10))

i = 0


def getBestShift(img):
    cy, cx = ndimage.measurements.center_of_mass(img)

    rows, cols = img.shape
    shiftx = np.round(cols / 2.0 - cx).astype(int)
    shifty = np.round(rows / 2.0 - cy).astype(int)

    return shiftx, shifty


def shift(img, sx, sy):
    rows, cols = img.shape
    M = np.float32([[1, 0, sx], [0, 1, sy]])
    shifted = cv2.warpAffine(img, M, (cols, rows))
    return shifted


for no in [1, 3, 4, 5]:
    image = cv2.imread("images/" + str(no) + ".jpg")
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edged = cv2.Canny(blurred, 50, 200, 255)

    cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
                            cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if imutils.is_cv2() else cnts[1]
    cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
    displayCnt = None

    for c in cnts:
        # approximate the contour
        peri = cv2.arcLength(c, True)
        approx = cv2.approxPolyDP(c, 0.02 * peri, True)

        # if the contour has four vertices, then we have found
        # the thermostat display
        if len(approx) == 4:
            displayCnt = approx
            break

    warped = four_point_transform(gray, displayCnt.reshape(4, 2))
    gray = cv2.resize(255 - warped, (28, 28))
    (thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY |     cv2.THRESH_OTSU)


    while np.sum(gray[0]) == 0:
        gray = gray[1:]

    while np.sum(gray[:, 0]) == 0:
        gray = np.delete(gray, 0, 1)

    while np.sum(gray[-1]) == 0:
        gray = gray[:-1]

    while np.sum(gray[:, -1]) == 0:
        gray = np.delete(gray, -1, 1)

    rows, cols = gray.shape

    if rows > cols:
        factor = 20.0 / rows
        rows = 20
        cols = int(round(cols * factor))
        gray = cv2.resize(gray, (cols, rows))

    else:
        factor = 20.0 / cols
        cols = 20
        rows = int(round(rows * factor))
        gray = cv2.resize(gray, (cols, rows))

    colsPadding = (int(np.math.ceil((28 - cols) / 2.0)), int(np.math.floor((28 - cols) / 2.0)))
    rowsPadding = (int(np.math.ceil((28 - rows) / 2.0)), int(np.math.floor((28 - rows) / 2.0)))
    gray = np.lib.pad(gray, (rowsPadding, colsPadding), 'constant')

    shiftx, shifty = getBestShift(gray)
    shifted = shift(gray, shiftx, shifty)
    gray = shifted

    cv2.imwrite("processed/" + str(no) + ".png", gray)
    cv2.imshow("imgs", gray)
    cv2.waitKey(0)

When you resize the image, make sure you select the interpolation that best suits your needs.调整图像大小时,请确保选择最适合您需要的插值。 For this, I recommend:为此,我建议:

gray = cv2.resize(255 - warped, (28, 28), interpolation=cv2.INTER_AREA)

which results in这导致在此处输入图片说明 after the rest of your processing.在您的其余处理之后。

You can see a comparison of methods here: http://tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/ but since there's just a handful, you can try them all out and see what gives the best results.您可以在此处查看方法的比较: http : //tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/但由于只有少数方法,您可以全部尝试一下,看看什么给出了最佳结果。 It looks like the default is INTER_LINEAR.看起来默认值是 INTER_LINEAR。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM