使用 Emgu.CV 提高圖像質量

Question

我正在嘗試使用 Tesseract 從圖像中讀取文本。 由於圖像質量不好，我在 tesseract-ocr 之前使用圖像處理來提高准確性。

圖片鏈接： https://i.stack.imgur.com/ZaA8M.jpg

編寫以下代碼以使用高斯濾波器檢測邊緣：

var img = new Image<Gray, Byte>("ZaA8M.jpg");
Image<Gray, byte> Img_Source_Gray = img.Copy();
Image<Gray, byte> Img_Egde_Gray = Img_Source_Gray.CopyBlank();
Image<Gray, byte> Img_SourceSmoothed_Gray = Img_Source_Gray.CopyBlank();
Image<Gray, byte> Img_Otsu_Gray = img.CopyBlank();

//Smooth the edges by using a Gaussian filter
Img_SourceSmoothed_Gray = Img_Source_Gray.SmoothGaussian(3);

//Segmentation of the characters by using a threshold
CvInvoke.Threshold(Img_Source_Gray, Img_Otsu_Gray, 255, 0, Emgu.CV.CvEnum.ThresholdType.Otsu | Emgu.CV.CvEnum.ThresholdType.Binary);

Bitmap imgb = (Bitmap)byteArrayToImage(Img_Otsu_Gray.ToJpegData());
imgb.Save("test.jpg");

Output 圖片鏈接： https://i.stack.imgur.com/Bj8XX.jpg

應用閾值后，字符變得更深黑色，不可讀。

如何提高上述圖像質量以使用 tesseract-ocr 從中讀取文本？

Answer 1

我認為您正在做很多圖像處理。 例如，您的原始圖像已經有相互接觸的字母。 通過應用 SmoothGaussian 過濾器，您可能會使這些邊緣更加相互融合，盡管您正在平滑其他點的一些不規則性。 當輸入質量低時，我更喜歡使用 png 圖像而不是 jpg，因為 jpg 壓縮增加了另一層質量損失。

我用精心選擇的參數對你的圖像做了一個過濾器（vb.net）

img.ThresholdAdaptive(New Gray(255), CvEnum.AdaptiveThresholdType.GaussianC, CvEnum.ThresholdType.Binary, 45, New Gray(9))

這是我的圖像結果

自適應濾波后的圖像

這是我的 Tesseract 結果

Supplement Facts
Serving Size 30 Drops (1 ml)
Servings Per Container 30

Amount Per Serving 333mg
Herb Weight Equivalence

Organic
Milk Thistle Seed
(Silybum marianum)

Herb Strength Ratio 1:3

* Daily Value Not Established

OTHER INGREDIENTS: :
ionized Water, Grain Alcohol
(20-30% by volume) |

— 4

將此用作 tesseract v5.0.0 命令行

tesseract.exe --oem 3 --psm 3 -l eng "c:\temp\input.png" "C:\temp\output"

使用 Emgu.CV 提高圖像質量

問題描述

1 個解決方案

解決方案1
0 2022-07-31 14:56:03

使用 Emgu.CV 提高圖像質量

問題描述

1 個解決方案

解決方案1 0 2022-07-31 14:56:03

解決方案1
0 2022-07-31 14:56:03