[英]How to Crop Image Based on Google Vision API Bounding Poly Normalized Vertices using OpenCV for Python
我正在努力在 Python 中實現 Google Vision Detect Multiple Objects API ( https://cloud.google.com/vision/docs/object-localizer )
我遇到的問題是我不知道如何使用響應中返回的 boundingPoly nomralizedVerticies 來確定如何使用 OpenCV 裁剪原始圖像。
示例響應
{
"responses": [
{
"localizedObjectAnnotations": [
{
"mid": "/m/0bt_c3",
"name": "Book",
"score": 0.8462029,
"boundingPoly": {
"normalizedVertices": [
{
"x": 0.1758254,
"y": 0.046406608
},
{
"x": 0.84299797,
"y": 0.046406608
},
{
"x": 0.84299797,
"y": 0.9397349
},
{
"x": 0.1758254,
"y": 0.9397349
}
]
}
}
]
}
]
}
更新
所以這些是我正在使用的坐標。
points = [
(
0.17716026,
0.04550384
),
(
0.8430133,
0.04550384
),
(
0.8430133,
0.9376166
),
(
0.17716026,
0.9376166
)
]
當我使用@MSS 提供的答案運行我的代碼並使用它來繪制輪廓時,我得到了下圖。
from this import d
from pyimagesearch import imutils
from skimage import exposure
import numpy as np
import argparse
import cv2
from skimage.transform import rotate
from rembg import remove
ap = argparse.ArgumentParser()
ap.add_argument("-q", "--query", required = True,
help = "Path to the query image")
args = vars(ap.parse_args())
image = cv2.imread(args["query"])
orig = image.copy()
IMAGE_SHAPE = image.shape
points = [
(
0.17716026,
0.04550384
),
(
0.8430133,
0.04550384
),
(
0.8430133,
0.9376166
),
(
0.17716026,
0.9376166
)
]
coords = []
for point in points:
pixels = tuple(round(coord * dimension) for coord, dimension in zip(point, IMAGE_SHAPE))
coords.append(pixels)
points = np.array(coords)
cv2.drawContours(image, [points], -1, (0, 255, 0), 1)
cv2.imshow("Image", image)
cv2.waitKey(0)
這是輸出的圖像。 所以看起來好像裁剪已關閉。 輸出的裁剪圖像也與輪廓匹配。
您可以在此屏幕截圖中看到它表明它正在正確找到對象。
更新最終問題是圖像由於某種原因被翻轉。 我必須閱讀 IMAGE_SHAPE 並執行此操作。
IMAGE_SHAPE = image.shape[:2]
IMAGE_SHAPE = (IMAGE_SHAPE[1], IMAGE_SHAPE[0])
您必須根據原始圖像的大小對坐標進行非規范化才能獲得真實坐標。
(number_of_rows, number_of_columns) = image.shape[:2]
x_unormalized = round(x_normalized * number_of_rows)
y_unnormalized = round(y_normalized * number_of_columns)
...
cropped_image = image[y_unnormalized:y_unnormalized + h, x_unormalized:x_unormalized + w]
這是通過考慮通過以下方式獲得歸一化值:
normalized_value = true_value/max(all_values)
如果應用了一些其他規范化,那么您必須應用該特定規范化的逆。
更新:
這是工作代碼。 我已經對其進行了測試,並且工作正常。 我認為您認為坐標值不正確。
# from this import d
# from pyimagesearch import imutils
# import numpy as np
# import argparse
# from rembg import remove
#from skimage import exposure
#from skimage.transform import rotate
import cv2
image = cv2.imread("Path to image.jpg")
orig = image.copy()
(number_of_rows, number_of_columns) = image.shape[:2]
points = [
(
0.17716026,
0.04550384
),
(
0.8430133,
0.04550384
),
(
0.8430133,
0.9376166
),
(
0.17716026,
0.9376166
)
]
first_point_y = round(points[0][0] * number_of_columns)
first_point_x = round(points[0][1] * number_of_rows)
second_point_y = round(points[2][0] * number_of_columns)
second_point_x = round(points[2][1] * number_of_rows)
# coords = []
# for point in points:
# pixels = tuple(round(coord * dimension) for coord, dimension in zip(point, IMAGE_SHAPE))
# coords.append(pixels)
# points = np.array(coords)
image = cv2.rectangle(image, (first_point_y, first_point_x), (second_point_y, second_point_x), (0, 255, 0), 1)
# cv2.drawContours(image, [points], -1, (0, 255, 0), 1)
cv2.imshow("Image", image)
cv2.waitKey(0)
這是輸出圖像:
首先將歸一化坐標轉換為像素坐標,如下所示:
test_coord = (0.5, 0.3)
IMAGE_SHAPE = (1920, 1080) # EXample
def to_pixel_coords(relative_coords):
return tuple(round(coord * dimension) for coord, dimension in zip(relative_coords, IMAGE_SHAPE))
得到像素坐標后,假設它們是 (x1,y1)、(x2,y2)、(x3,y3) 和 (x4,y4)。 然后您可以按如下方式裁剪原始圖像:
top_left_x = min([x1,x2,x3,x4])
top_left_y = min([y1,y2,y3,y4])
bot_right_x = max([x1,x2,x3,x4])
bot_right_y = max([y1,y2,y3,y4])
img[top_left_y:bot_right_y+1, top_left_x:bot_right_x+1] # added 1 pixel more as last one is excluded in slicing.
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.