简体   繁体   English

提高图像中字母的质量

[英]Improve the quality of the letters in a image

I'm working with images that have text. 我正在处理有文字的图像。 The problem is that these images are receipts, and after a lot of transformations, the text lost quality. 问题是这些图像是收据,经过大量的转换后,文本质量下降。 I'm using python and opencv. 我正在使用python和opencv。 I was trying with a lot of combinations of morphological transformations from the doc Morphological Transformations , but I don't get satisfactory results. 我正在尝试使用doc Morphological Transformations中的许多形态转换组合,但我没有得到满意的结果。

I'm doing this right now (I'll comment what I've tried, and just let uncommented what I'm using): 我现在正在做这个(我会评论我尝试过的,只是让我注释掉我正在使用的内容):

kernel = np.ones((2, 2), np.uint8)
# opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
# closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
# dilation = cv2.dilate(opening, kernel, iterations=1)
# kernel = np.ones((3, 3), np.uint8)
erosion = cv2.erode(img, kernel, iterations=1)
# gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
#
img = erosion.copy()

With this, from this original image: 有了这个,从这个原始图像:

在此输入图像描述

I get this: 我明白了:

在此输入图像描述

It's a little bit better, as you can see. 你可以看到它好一点。 But it still too bad. 但它仍然太糟糕了。 The OCR (tesseract) doesn't recognize the characters here very well. OCR(tesseract)不能很好地识别这里的角色。 I've trained, but as you can note, every "e" is different, and so on. 我已经训练过,但是你可以注意到,每个“e”都是不同的,依此类推。

I get good results, but I think, if I resolve this problem, they would be even better. 我得到了很好的结果,但我认为,如果我解决这个问题,他们会更好。

Maybe I can do another thing, or use a better combination of the morphological transformations. 也许我可以做另一件事,或者使用更好的形态转换组合。 If there is another tool (PIL, imagemagick, etc..) that I could use, I can use it. 如果我可以使用其他工具(PIL,imagemagick等),我可以使用它。

Here's the whole image, so you can see how it looks: 这是整个图像,所以你可以看到它的外观:

在此输入图像描述

As I said, it's not so bad, but a little be more "optimization" of the letters would be perfect. 正如我所说,它并没有那么糟糕,但对字母的一点点“优化”将是完美的。

After years working in this theme, I can tell now, that what I wanted to do take a big effort, it's quite slow, and NEVER worked as I expected. 经过多年这个主题的工作,我现在可以告诉我,我想要做的事情需要付出很大的努力,它很慢,而且从来没有按照我的预期工作。 The irregularities of the pixels in the characters are always unpredictable, that's why "easy algorithms" don't work. 字符中像素的不规则性总是不可预测的,这就是“简单算法”不起作用的原因。

Question: It's impossible then to have a decent OCR, which can read damaged characters? 问题:那么拥有一个可以读取损坏字符的体面OCR是不可能的吗?

Answer: No, it's not impossible. 答:不,这不是不可能的。 But it takes "a bit" more than just using erosion, morphological closing or something like that. 但它需要“一点点”,而不仅仅是使用侵蚀,形态学关闭或类似的东西。

Then, how? 那怎么样? Neural Networks :) 神经网络 :)

Here are two amazing papers that help me a lot: 这里有两篇很棒的论文对我很有帮助:

Can we build language-independent OCR using LSTM networks? 我们可以使用LSTM网络构建与语言无关的OCR吗?

Reading Scene Text in Deep Convolutional Sequences 在深度卷积序列中读取场景文本

And for those who aren't familiar with RNN, I can suggest this: 对于那些不熟悉RNN的人,我可以建议:

Understanding LSTM Networks 了解LSTM网络

There's also a python library, which works pretty good (and unfortunately even better for C++): 还有一个python库,它工作得很好(不幸的是,对C ++来说更好):

ocropy ocropy

I really hope this can help someone. 我真的希望这可以帮助别人。

Did you consider the neighboring pixels and add sum of them. 您是否考虑了相邻像素并添加了它们的总和。

For example: 例如:

n = numpy.zeros((3,3))
s = numpy.zeros((3,3))
w = numpy.zeros((3,3))
e = numpy.zeros((3,3))

n[0][1] = 1
s[2][1] = 1
w[1][0] = 1
e[1][2] = 1

img_n = cv2.erode(img, n, iterations=1)
img_s = cv2.erode(img, s, iterations=1)
img_w = cv2.erode(img, w, iterations=1)
img_e = cv2.erode(img, e, iterations=1)

result = img_n + img_s + img_w + img_e + img

Also, you can either numpy or cv2 to add the arrays. 此外,您可以numpy或cv2添加数组。

In my experience erode impairs OCR quality. 根据我的经验,侵蚀会损害OCR质量。 If you have grayscale image (not binary) you can use better binarization algorithm. 如果您有灰度图像(非二进制),则可以使用更好的二值化算法。 I use SAUVOLA algorithm for binarization. 我使用SAUVOLA算法进行二值化。 If you have only binary image the best thing you can do is removing the noise (remove all small dots). 如果您只有二进制图像,那么您可以做的最好的事情是消除噪音(删除所有小点)。

I found the Ramer–Douglas–Peucker Algorithm I'm trying to implement it for closed polygons in Haskell. 我找到了Ramer-Douglas-Peucker算法,我试图在Haskell中为闭合多边形实现它。 Maybe it can solve something. 也许它可以解决一些问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM