简体   繁体   中英

how to classify if images contain text or not?

I have a lot of images extracted from Search engine, and I am use OCR to perform descent text extraction from these image, but There are images that do not contain text.

Thus I would like to determine if an image simply contains text or not in python, and if it doesn't, i wouldn't have to perform OCR on it. Ideally this method would have a high recall.

Use pytteseract. Something like this:

from PIL import Image
import pytesseract

def contains_text(image_path):
    text = pytesseract.image_to_string(Image.open(image_path))
    
    if text == "":
        return False # No text detected
    else:
        return text

I do not know of a way to detect that there is no text without trying to perform OCR (like above).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM