简体   繁体   中英

How to draw bounding box around the Word and save it in folder opencv python

I am following https://github.com/mindee/doctr this GitHub repo to detect text I have converted the text coordinates in the absolute coordinate in [xmin, ymin, xmax, ymax]. I want to draw the bounding box using these values and cropped the image in the folder How can I do that

输入图像

import json

from doctr.io import DocumentFile

from doctr.models import ocr_predictor



model = ocr_predictor(pretrained=True)

# PDF

doc = DocumentFile.from_images("/content/passbook_64_0.jpeg")

# Analyze

result = model(doc)

# Export results in json

with open("/content/preds.json", "w") as f:

    json.dump(result.export(), f)

export = result.export()
# Flatten the export
page_words = [[word for block in page['blocks'] for line in block['lines'] for word in line['words']] for page in export['pages']]
page_dims = [page['dimensions'] for page in export['pages']]
# Get the coords in [xmin, ymin, xmax, ymax]
words_abs_coords = [
    [[int(round(word['geometry'][0][0] * dims[0])), int(round(word['geometry'][0][1] * dims[1])), int(round(word['geometry'][1][0] * dims[0])), int(round(word['geometry'][1][1] * dims[1]))] for word in words]
    for words, dims in zip(page_words, page_dims)

]
print(words_abs_coords)

The value of absolute coordinates obtained from the above code

[[[33, 108, 57, 135], [54, 107, 81, 136], [189, 110, 205, 141], [205, 112, 221, 141], [222, 114, 230, 141], [230, 112, 247, 141], [11, 173, 39, 196], [41, 175, 68, 196], [71, 175, 87, 198], [90, 177, 116, 198], [215, 179, 256, 199], [26, 204, 35, 225], [10, 203, 25, 227], [89, 204, 131, 228], [214, 207, 256, 227], [54, 228, 57, 236], [11, 225, 38, 246], [41, 224, 53, 247], [90, 225, 129, 245], [11, 244, 42, 265], [45, 245, 64, 267], [82, 246, 102, 267], [67, 246, 79, 268], [104, 247, 127, 268], [13, 301, 87, 324], [90, 303, 113, 323], [12, 327, 60, 349], [63, 331, 69, 347], [84, 331, 125, 349], [70, 328, 80, 351], [214, 334, 259, 356], [61, 360, 108, 378], [41, 357, 59, 382], [130, 360, 160, 381], [111, 359, 128, 382], [214, 362, 282, 386], [41, 388, 62, 411], [63, 388, 84, 411], [85, 388, 106, 411], [108, 387, 131, 410], [213, 392, 237, 415], [239, 393, 276, 418], [11, 415, 34, 439], [213, 419, 230, 444], [231, 419, 241, 444], [244, 422, 286, 447], [11, 443, 34, 467], [208, 441, 252, 477], [259, 451, 287, 476], [11, 474, 34, 497], [52, 471, 80, 496], [38, 470, 51, 498], [215, 478, 274, 501], [10, 501, 30, 525], [207, 505, 267, 531], [49, 531, 123, 555], [11, 536, 27, 559], [29, 534, 46, 562], [204, 536, 233, 560], [234, 538, 259, 562]]]
import matplotlib.pyplot as plt
import cv2
image = cv2.imread("/content/passbook_82_0.jpeg")
im_height, im_width, _ = image.shape
xmin=words_abs_coords[0][0][0]
ymin=words_abs_coords[0][0][1]
xmax=words_abs_coords[0][0][2]
ymax=words_abs_coords[0][0][3]
image1 = cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (0,255,0), 2)
plt.imshow(image1)

输出图像

For anyone finding this thread, I believe the answer was already provided on the dedicated GitHub discussion over there: https://github.com/mindee/doctr/discussions/570

I think the only part to change in the snippet is this:

words_abs_coords = [
[[int(round(word['geometry'][0][0] * dims[0])), int(round(word['geometry'][0][1] * dims[1])), int(round(word['geometry'][1][0] * dims[0])), int(round(word['geometry'][1][1] * dims[1]))] for word in words]
for words, dims in zip(page_words, page_dims)

]

The page dimensions order is wrongly used, as pointed out in the discussion, changing it to:

words_abs_coords = [
[[int(round(word['geometry'][0][0] * dims[1])), int(round(word['geometry'][0][1] * dims[0])), int(round(word['geometry'][1][0] * dims[1])), int(round(word['geometry'][1][1] * dims[0]))] for word in words]
for words, dims in zip(page_words, page_dims)

]

should solve your problem :)

Cheers!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM