简体   繁体   中英

Extract a white with black border text in an image with a complex background

I need to extract text from various screenshot with any kind of colour as background, but the text is constant and always white with black border. These are some examples:

在此处输入图像描述

在此处输入图像描述

And this is the code I'm using right now:

custom_config = r"--oem 3 --psm 11 -c tessedit_char_whitelist= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 '"



def preprocess_finale(im):
   im = cv2.bilateralFilter(im,5, 55,60)
   im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
   _, im = cv2.threshold(im, 240, 255, 1)

   return im

img= np.array(Image.open(i))
im = preprocess_finale(img)
   
text = pytesseract.image_to_string(im, lang='ita', config=custom_config)

But the results are still not accurate at all. How can I improve my code?

Thank you all

Binarize and find all white blobs with width and height in a suitable range. Then you can cluster the bounding boxes that are horizontally aligned, and sort the blobs horizontally in every cluster.

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM