简体   繁体   中英

Pytesseract OCR Bounding Box

I am trying to use pytesseract to ocr within a labeled image, the labels return an xml with the relevant bounding boxes. I would like to match the labels to see if they overlap with text OCR'ed from the whole image using their bounding box. Is there a way to get each full word's bounding box? I have tried this code below, but it returns the bounding box for each letter so I am wondering if anyone can recommend an alternative way to do this or if they know an alternative ocr package in python that can use my xml file to find the bounding boxes and ocr within them instead (or if they know how pytesseract could do this?). I don't need a code answer, just some advice.

> img = cv2.imread(filename) h, w, _ = img.shape boxes =
> pytesseract.image_to_boxes(img)  for b in boxes.splitlines():
>     b = b.split(' ')
>     img = cv2.rectangle(img, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2)

I expect for the code to return only the relevant occurring within the labels i have created, I can perform the code that does the overlap check but I just need each word complete bounding box...

In case someone is still looking for an answer: pytesseract's image_to_data returns a bounding box for the whole word. It returns bounding boxes for each word, the word, and other information. You can find out more about the function's output here: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM