简体   繁体   中英

Proper image thresholding to prepare it for OCR in python using opencv

I am really new to opencv and a beginner to python.

I have this image:

原始bmp 24位图像

I want to somehow apply proper thresholding to keep nothing but the 6 digits.

The bigger picture is that I intend to try to perform manual OCR to the image for each digit separately, using the k-nearest neighbours algorithm on a per digit level (kNearest.findNearest)

The problem is that I cannot clean up the digits sufficiently, especially the '7' digit which has this blue-ish watermark passing through it.

The steps I have tried so far are the following:

I am reading the image from disk

# IMREAD_UNCHANGED is -1
image = cv2.imread(sys.argv[1], cv2.IMREAD_UNCHANGED)

Then I'm keeping only the blue channel to get rid of the blue watermark around digit '7', effectively converting it to a single channel image

image = image[:,:,0] 
# openned with -1 which means as is, 
# so the blue channel is the first in BGR

单通道-仅红色-图片

Then I'm multiplying it a bit to increase contrast between the digits and the background:

image = cv2.multiply(image, 1.5)

倍增图像以增加对比度

Finally I perform Binary+Otsu thresholding:

_,thressed1 = cv2.threshold(image,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

瓦胡岛二进制阈值图像

As you can see the end result is pretty good except for the digit '7' which has kept a lot of noise.

How to improve the end result? Please supply the image example result where possible, it is better to understand than just code snippets alone.

You can try to medianBlur the gray(blur) image with different kernels(such as 3, 51), divide the blured results, and threshold it. Something like this:

在此处输入图片说明


#!/usr/bin/python3
# 2018/09/23 17:29 (CST) 
# (中秋节快乐)
# (Happy Mid-Autumn Festival)

import cv2 
import numpy as np 

fname = "color.png"
bgray = cv2.imread(fname)[...,0]

blured1 = cv2.medianBlur(bgray,3)
blured2 = cv2.medianBlur(bgray,51)
divided = np.ma.divide(blured1, blured2).data
normed = np.uint8(255*divided/divided.max())
th, threshed = cv2.threshold(normed, 100, 255, cv2.THRESH_OTSU)

dst = np.vstack((bgray, blured1, blured2, normed, threshed)) 
cv2.imwrite("dst.png", dst)

The result:

在此处输入图片说明

It doesn't seem easy to completely remove the annoying stamp.

What you can do is flattening the background intensity by

  • computing a lowpass image (Gaussian filter, morphological closing); the filter size should be a little larger than the character size;

  • dividing the original image by the lowpass image.

Then you can use Otsu.

在此处输入图片说明

As you see, the result isn't perfect.

Why not just keep values in the image that are above a certain threshold?

Like this:

import cv2
import numpy as np

img = cv2.imread("./a.png")[:,:,0]  # the last readable image

new_img = []
for line in img:
    new_img.append(np.array(list(map(lambda x: 0 if x < 100 else 255, line))))

new_img = np.array(list(map(lambda x: np.array(x), new_img)))

cv2.imwrite("./b.png", new_img) 

Looks great:

You could probably play with the threshold even more and get better results.

I tried a slightly different approach then Yves on the blue channel: 蓝色通道

  • Apply median filter (r=2):

过滤图像

  • Use Edge detection (eg Sobel operator):

检测到边缘

  • Automatic thresholding (Otsu)

阈值图像

  • Closing of the image

封闭图像

This approach seems to make the output a little less noisy. However, one has to address the holes in the numbers. This can be done by detecting black contours which are completely surrounded by white pixels and simply filling them with white.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM