How to optimal preprocess images for Tesseract in C#, when grayscaled image text color "interferes" with the background color?

Question

I'm struggling with finding a optimal binarization as preprocessing step for OCR (tesseract in C#).

The images are 1624 X 1728 of pixel size and contain car gui elements (Buttons, Sliders, Info Boxes) and corresponding text from a car navigation command interface generation (different use case scenarios like radio control, car control, etc.). The images contain multiple colors, most of images are dark blue, and the text is white/gray or close to white. Unfortunately, I cannot share the images due to data privacy.

Problem: I cannot separate the text from the background in a efficent way (text to be black, everything else to be white), because the text color has a high range and is partialy the same with the background color (speaking of grayscaled images).

Actual procedure: First I convert the RGB Image from System.Drawing.Image to OpenCvSharp.Mat. Then I convert the Mat image from colored to gray and then from gray to binarized.

This is the main code for the binarization:

Mat binarized = grayscaled.Threshold(tresh, maxVal, ThresholdTypes.BinaryInv);

I use 255 as maxVal . If I use tresh=90 , the binarized image looks ok overall (even if tesseract results are bad here), but some pixels of the bottom control elements text (and some other text) are white, because the tresh is too high (so some text characters are unsharp and not complete).

If I use like tresh = 40 , the characters of the bottom control elements become complete and sharp (as the should be), but the background (middle of the image) gets completely black, which means that some text in there disappears inside of a big black chunk. So the problem is a high text pixel color range inside of the grayscaled image that "interferes" with the colors of other elements or background, which makes the text extraction hard.

Note: I already tried AdaptiveThresholding like MeanC and GaussianC with different treshholds, kernel sizes and mean substraction constants without good results.

Question: What would be a efficient solution for the preprocessing?

I'm thinking about writing a method that binarizas from RGB, not from grayscaled. So the method would take a RGB image as input and binarize that white text color range into black and everything else into white.

Answer 1

One approach is to remove any frequencies in the image lower than that of your text. This can be done by creating a blurred copy of the image, with a kernel a bit larger than your text, and subtract this blurred image from the original. This should keep high frequencies, ie text and other edges, while removing any vignetting or other gradients over the image. Keep in mind that the resulting image will have a different range of values, where some will probably be negative.

Another option would be to split the image into sections, and use different thresholds in each, but that may lead to artifacts at the section boundaries.

How to optimal preprocess images for Tesseract in C#, when grayscaled image text color "interferes" with the background color?

Question

1 answers

solution1
0 2022-08-23 20:08:27

How to optimal preprocess images for Tesseract in C#, when grayscaled image text color "interferes" with the background color?

Question

1 answers

solution1 0 2022-08-23 20:08:27

solution1
0 2022-08-23 20:08:27