简体   繁体   English

适当的图像阈值处理,以使用opencv在python中为OCR做好准备

[英]Proper image thresholding to prepare it for OCR in python using opencv

I am really new to opencv and a beginner to python. 我真的是opencv的新手,也是python的初学者。

I have this image: 我有这张图片:

原始bmp 24位图像

I want to somehow apply proper thresholding to keep nothing but the 6 digits. 我想以某种方式应用适当的阈值以仅保留6位数字。

The bigger picture is that I intend to try to perform manual OCR to the image for each digit separately, using the k-nearest neighbours algorithm on a per digit level (kNearest.findNearest) 更大的图景是,我打算尝试在每个数字级别(kNearest.findNearest)上使用k最近邻算法对每个数字分别对图像进行手动OCR。

The problem is that I cannot clean up the digits sufficiently, especially the '7' digit which has this blue-ish watermark passing through it. 问题是我无法充分清理数字,尤其是带有蓝色水印的“ 7”数字。

The steps I have tried so far are the following: 我到目前为止尝试过的步骤如下:

I am reading the image from disk 我正在从磁盘读取图像

# IMREAD_UNCHANGED is -1
image = cv2.imread(sys.argv[1], cv2.IMREAD_UNCHANGED)

Then I'm keeping only the blue channel to get rid of the blue watermark around digit '7', effectively converting it to a single channel image 然后,我只保留蓝色通道以消除数字“ 7”周围的蓝色水印,从而有效地将其转换为单个通道图像

image = image[:,:,0] 
# openned with -1 which means as is, 
# so the blue channel is the first in BGR

单通道-仅红色-图片

Then I'm multiplying it a bit to increase contrast between the digits and the background: 然后我将其相乘以增加数字和背景之间的对比度:

image = cv2.multiply(image, 1.5)

倍增图像以增加对比度

Finally I perform Binary+Otsu thresholding: 最后,我执行Binary + Otsu阈值化:

_,thressed1 = cv2.threshold(image,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

瓦胡岛二进制阈值图像

As you can see the end result is pretty good except for the digit '7' which has kept a lot of noise. 如您所见,除了数字'7'保留了大量噪音外,最终结果还不错。

How to improve the end result? 如何提高最终结果? Please supply the image example result where possible, it is better to understand than just code snippets alone. 请在可能的情况下提供图像示例结果,这比仅提供代码段更好理解。

You can try to medianBlur the gray(blur) image with different kernels(such as 3, 51), divide the blured results, and threshold it. 您可以尝试对具有不同内核(例如3、51)的灰度(模糊)图像进行中值模糊处理,将模糊结果除以阈值。 Something like this: 像这样的东西:

在此处输入图片说明


#!/usr/bin/python3
# 2018/09/23 17:29 (CST) 
# (中秋节快乐)
# (Happy Mid-Autumn Festival)

import cv2 
import numpy as np 

fname = "color.png"
bgray = cv2.imread(fname)[...,0]

blured1 = cv2.medianBlur(bgray,3)
blured2 = cv2.medianBlur(bgray,51)
divided = np.ma.divide(blured1, blured2).data
normed = np.uint8(255*divided/divided.max())
th, threshed = cv2.threshold(normed, 100, 255, cv2.THRESH_OTSU)

dst = np.vstack((bgray, blured1, blured2, normed, threshed)) 
cv2.imwrite("dst.png", dst)

The result: 结果:

在此处输入图片说明

It doesn't seem easy to completely remove the annoying stamp. 完全删除烦人的图章似乎并不容易。

What you can do is flattening the background intensity by 您可以做的是通过

  • computing a lowpass image (Gaussian filter, morphological closing); 计算低通图像(高斯滤波器,形态学闭合); the filter size should be a little larger than the character size; 过滤器的大小应略大于字符的大小;

  • dividing the original image by the lowpass image. 将原始图像除以低通图像。

Then you can use Otsu. 然后,您可以使用Otsu。

在此处输入图片说明

As you see, the result isn't perfect. 如您所见,结果并不完美。

Why not just keep values in the image that are above a certain threshold? 为什么不只将图像中的值保持在某个阈值之上?

Like this: 像这样:

import cv2
import numpy as np

img = cv2.imread("./a.png")[:,:,0]  # the last readable image

new_img = []
for line in img:
    new_img.append(np.array(list(map(lambda x: 0 if x < 100 else 255, line))))

new_img = np.array(list(map(lambda x: np.array(x), new_img)))

cv2.imwrite("./b.png", new_img) 

Looks great: 看起来很棒:

You could probably play with the threshold even more and get better results. 您可能甚至会更多地使用阈值并获得更好的结果。

I tried a slightly different approach then Yves on the blue channel: 我在蓝色频道上尝试了与伊夫略有不同的方法: 蓝色通道

  • Apply median filter (r=2): 应用中值过滤器(r = 2):

过滤图像

  • Use Edge detection (eg Sobel operator): 使用边缘检测(例如,Sobel运算符):

检测到边缘

  • Automatic thresholding (Otsu) 自动阈值(大津)

阈值图像

  • Closing of the image 图像关闭

封闭图像

This approach seems to make the output a little less noisy. 这种方法似乎使输出的杂音少一些。 However, one has to address the holes in the numbers. 但是,必须解决数字中的漏洞。 This can be done by detecting black contours which are completely surrounded by white pixels and simply filling them with white. 这可以通过检测被白色像素完全包围的黑色轮廓并将其简单地填充为白色来完成。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM