简体   繁体   English

使用 Emgu.CV 提高图像质量

[英]Improve image quality using Emgu.CV

I am trying to read text from images using Tesseract.我正在尝试使用 Tesseract 从图像中读取文本。 As the image quality is not good I'm using image processing before tesseract-ocr to improve the accuracy.由于图像质量不好,我在 tesseract-ocr 之前使用图像处理来提高准确性。

Image link: https://i.stack.imgur.com/ZaA8M.jpg图片链接: https://i.stack.imgur.com/ZaA8M.jpg

Wrote below code to detect edges by using a Gaussian filter:编写以下代码以使用高斯滤波器检测边缘:

var img = new Image<Gray, Byte>("ZaA8M.jpg");
Image<Gray, byte> Img_Source_Gray = img.Copy();
Image<Gray, byte> Img_Egde_Gray = Img_Source_Gray.CopyBlank();
Image<Gray, byte> Img_SourceSmoothed_Gray = Img_Source_Gray.CopyBlank();
Image<Gray, byte> Img_Otsu_Gray = img.CopyBlank();

//Smooth the edges by using a Gaussian filter
Img_SourceSmoothed_Gray = Img_Source_Gray.SmoothGaussian(3);

//Segmentation of the characters by using a threshold
CvInvoke.Threshold(Img_Source_Gray, Img_Otsu_Gray, 255, 0, Emgu.CV.CvEnum.ThresholdType.Otsu | Emgu.CV.CvEnum.ThresholdType.Binary);

Bitmap imgb = (Bitmap)byteArrayToImage(Img_Otsu_Gray.ToJpegData());
imgb.Save("test.jpg");

Output image link: https://i.stack.imgur.com/Bj8XX.jpg Output 图片链接: https://i.stack.imgur.com/Bj8XX.jpg

After applying the threshold, characters become more dark black color and not it's readable.应用阈值后,字符变得更深黑色,不可读。

How to improve the above image quality to read the text from it using tesseract-ocr?如何提高上述图像质量以使用 tesseract-ocr 从中读取文本?

I think you are doing to much image processing.我认为您正在做很多图像处理。 For example your original image already has letters that are touching each other.例如,您的原始图像已经有相互接触的字母。 By applying a SmoothGaussian filter, you are likely making those edges blend even more into each other, although you are smoothing out some irregularities at other spots.通过应用 SmoothGaussian 过滤器,您可能会使这些边缘更加相互融合,尽管您正在平滑其他点的一些不规则性。 Also I prefer working with png images instead of jpg when the input is low quality, because the jpg compression adds another layer of quality loss.当输入质量低时,我更喜欢使用 png 图像而不是 jpg,因为 jpg 压缩增加了另一层质量损失。

I did a single filter on your image with well chosen parameters (vb.net)我用精心选择的参数对你的图像做了一个过滤器(vb.net)

img.ThresholdAdaptive(New Gray(255), CvEnum.AdaptiveThresholdType.GaussianC, CvEnum.ThresholdType.Binary, 45, New Gray(9))

and this is my image result这是我的图像结果

image after adaptive filter自适应滤波后的图像

here is my Tesseract result这是我的 Tesseract 结果

Supplement Facts
Serving Size 30 Drops (1 ml)
Servings Per Container 30

Amount Per Serving 333mg
Herb Weight Equivalence

Organic
Milk Thistle Seed
(Silybum marianum)

Herb Strength Ratio 1:3

* Daily Value Not Established

OTHER INGREDIENTS: :
ionized Water, Grain Alcohol
(20-30% by volume) |

— 4

using this as tesseract v5.0.0 command line将此用作 tesseract v5.0.0 命令行

tesseract.exe --oem 3 --psm 3 -l eng "c:\temp\input.png" "C:\temp\output"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM