如何使用 Python 的 OpenCV 基於 Google Vision API 邊界多歸一化頂點裁剪圖像

Question

我正在努力在 Python 中實現 Google Vision Detect Multiple Objects API ( https://cloud.google.com/vision/docs/object-localizer )

我遇到的問題是我不知道如何使用響應中返回的 boundingPoly nomralizedVerticies 來確定如何使用 OpenCV 裁剪原始圖像。

示例響應

{
  "responses": [
    {
      "localizedObjectAnnotations": [
        {
          "mid": "/m/0bt_c3",
          "name": "Book",
          "score": 0.8462029,
          "boundingPoly": {
            "normalizedVertices": [
              {
                "x": 0.1758254,
                "y": 0.046406608
              },
              {
                "x": 0.84299797,
                "y": 0.046406608
              },
              {
                "x": 0.84299797,
                "y": 0.9397349
              },
              {
                "x": 0.1758254,
                "y": 0.9397349
              }
            ]
          }
        }
      ]
    }
  ]
}

更新

所以這些是我正在使用的坐標。

points = [
          (
             0.17716026,
             0.04550384
          ),
          (
             0.8430133,
             0.04550384
          ),
          (
             0.8430133,
             0.9376166
          ),
          (
             0.17716026,
             0.9376166
          )
        ]

他們指的是這張圖片。

當我使用@MSS 提供的答案運行我的代碼並使用它來繪制輪廓時，我得到了下圖。

from this import d
from pyimagesearch import imutils
from skimage import exposure
import numpy as np
import argparse
import cv2
from skimage.transform import rotate
from rembg import remove


ap = argparse.ArgumentParser()
ap.add_argument("-q", "--query", required = True,
    help = "Path to the query image")
args = vars(ap.parse_args())
image = cv2.imread(args["query"])
orig = image.copy()

IMAGE_SHAPE = image.shape
points = [
          (
             0.17716026,
             0.04550384
          ),
          (
             0.8430133,
             0.04550384
          ),
          (
             0.8430133,
             0.9376166
          ),
          (
             0.17716026,
             0.9376166
          )
        ]

coords = []
for point in points:
    pixels = tuple(round(coord * dimension) for coord, dimension in zip(point, IMAGE_SHAPE))
    coords.append(pixels)


points = np.array(coords)

cv2.drawContours(image, [points], -1, (0, 255, 0), 1) 
cv2.imshow("Image", image) 
cv2.waitKey(0)

這是輸出的圖像。 所以看起來好像裁剪已關閉。 輸出的裁剪圖像也與輪廓匹配。

您可以在此屏幕截圖中看到它表明它正在正確找到對象。

更新最終問題是圖像由於某種原因被翻轉。 我必須閱讀 IMAGE_SHAPE 並執行此操作。

IMAGE_SHAPE = image.shape[:2]
IMAGE_SHAPE = (IMAGE_SHAPE[1], IMAGE_SHAPE[0])

Answer 1

您必須根據原始圖像的大小對坐標進行非規范化才能獲得真實坐標。

(number_of_rows, number_of_columns) = image.shape[:2]

x_unormalized = round(x_normalized * number_of_rows)
y_unnormalized = round(y_normalized * number_of_columns)

...

cropped_image = image[y_unnormalized:y_unnormalized + h, x_unormalized:x_unormalized + w]

這是通過考慮通過以下方式獲得歸一化值：

normalized_value = true_value/max(all_values)

如果應用了一些其他規范化，那么您必須應用該特定規范化的逆。

更新：

這是工作代碼。 我已經對其進行了測試，並且工作正常。 我認為您認為坐標值不正確。

# from this import d
# from pyimagesearch import imutils
# import numpy as np
# import argparse
# from rembg import remove
#from skimage import exposure
#from skimage.transform import rotate
import cv2


image = cv2.imread("Path to image.jpg")
orig = image.copy()

(number_of_rows, number_of_columns) = image.shape[:2]
points = [
          (
             0.17716026,
             0.04550384
          ),
          (
             0.8430133,
             0.04550384
          ),
          (
             0.8430133,
             0.9376166
          ),
          (
             0.17716026,
             0.9376166
          )
        ]

first_point_y = round(points[0][0] * number_of_columns)
first_point_x = round(points[0][1] * number_of_rows)
second_point_y  = round(points[2][0] * number_of_columns)
second_point_x = round(points[2][1] * number_of_rows)

# coords = []
# for point in points:
#     pixels = tuple(round(coord * dimension) for coord, dimension in zip(point, IMAGE_SHAPE))
#     coords.append(pixels)


# points = np.array(coords)

image = cv2.rectangle(image, (first_point_y, first_point_x), (second_point_y, second_point_x), (0, 255, 0), 1)
# cv2.drawContours(image, [points], -1, (0, 255, 0), 1) 
cv2.imshow("Image", image) 
cv2.waitKey(0)

這是輸出圖像：

Answer 2

首先將歸一化坐標轉換為像素坐標，如下所示：

test_coord = (0.5, 0.3)

IMAGE_SHAPE = (1920, 1080) # EXample


def to_pixel_coords(relative_coords):
    return tuple(round(coord * dimension) for coord, dimension in zip(relative_coords, IMAGE_SHAPE))

得到像素坐標后，假設它們是 (x1,y1)、(x2,y2)、(x3,y3) 和 (x4,y4)。 然后您可以按如下方式裁剪原始圖像：

top_left_x = min([x1,x2,x3,x4])
top_left_y = min([y1,y2,y3,y4])
bot_right_x = max([x1,x2,x3,x4])
bot_right_y = max([y1,y2,y3,y4])

img[top_left_y:bot_right_y+1, top_left_x:bot_right_x+1] # added 1 pixel more as last one is excluded in slicing.

如何使用 Python 的 OpenCV 基於 Google Vision API 邊界多歸一化頂點裁剪圖像

問題描述

2 個解決方案

解決方案1
2 已采納 2022-06-08 05:19:25

解決方案2
1 2022-06-08 05:32:58

如何使用 Python 的 OpenCV 基於 Google Vision API 邊界多歸一化頂點裁剪圖像

問題描述

2 個解決方案

解決方案1 2 已采納 2022-06-08 05:19:25

解決方案2 1 2022-06-08 05:32:58

解決方案1
2 已采納 2022-06-08 05:19:25

解決方案2
1 2022-06-08 05:32:58