Extract a white with black border text in an image with a complex background

Question

I need to extract text from various screenshot with any kind of colour as background, but the text is constant and always white with black border. These are some examples:

And this is the code I'm using right now:

custom_config = r"--oem 3 --psm 11 -c tessedit_char_whitelist= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 '"



def preprocess_finale(im):
   im = cv2.bilateralFilter(im,5, 55,60)
   im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
   _, im = cv2.threshold(im, 240, 255, 1)

   return im

img= np.array(Image.open(i))
im = preprocess_finale(img)
   
text = pytesseract.image_to_string(im, lang='ita', config=custom_config)

But the results are still not accurate at all. How can I improve my code?

Thank you all

Answer 1

Binarize and find all white blobs with width and height in a suitable range. Then you can cluster the bounding boxes that are horizontally aligned, and sort the blobs horizontally in every cluster.

Extract a white with black border text in an image with a complex background

Question

1 answers

solution1
0 2022-07-29 19:31:25

Extract a white with black border text in an image with a complex background

Question

1 answers

solution1 0 2022-07-29 19:31:25

solution1
0 2022-07-29 19:31:25