[英]Resize bounding box according to image
我在Python中實現對象本地化。 我遇到的一個問題是,當我在采取行動時調整可觀察區域的大小時,我不知道如何同時更改地面實況框。 因此,發生這種情況:
地面實況框不會調整大小以准確適應飛機。 因此,我無法正確本地化。 我當前格式化下一個狀態的函數如下:
def next_state(init_input, b, b_prime, g, a):
"""
Returns the observable region of the next state.
Formats the next state's observable region, defined
by b_prime, to be of dimension (224, 224, 3). Adding 16
additional pixels of context around the original bounding box.
The ground truth box must be reformatted according to the
new observable region.
:param init_input:
The initial input volume of the current episode.
:param b:
The current state's bounding box.
:param b_prime:
The subsequent state's bounding box.
:param g:
The ground truth box of the target object.
:param a:
The action taken by the agent at the current step.
"""
# Determine the pixel coordinates of the observable region for the following state
context_pixels = 16
x1 = max(b_prime[0] - context_pixels, 0)
y1 = max(b_prime[1] - context_pixels, 0)
x2 = min(b_prime[2] + context_pixels, IMG_SIZE)
y2 = min(b_prime[3] + context_pixels, IMG_SIZE)
# Determine observable region
observable_region = cv2.resize(init_input[y1:y2, x1:x2], (224, 224))
# Difference between crop region and image dimensions
x1_diff = x1
y1_diff = y1
x2_diff = IMG_SIZE - x2
y2_diff = IMG_SIZE - y2
# Resize ground truth box
g[0] = int(g[0] - 0.5 * x1_diff) # x1
g[1] = int(g[1] - 0.5 * y1_diff) # y1
g[2] = int(g[2] + 0.5 * x2_diff) # x2
g[3] = int(g[3] + 0.5 * y2_diff) # y2
return observable_region, g
我似乎無法正確地改變尺寸。 我按照這篇文章來初步調整邊界框的大小。 然而,在這種情況下,該解決方案似乎並不起作用。
邊界框/地面實況框的格式為: b = [x1, y1, x2, y2]
init_input
具有維度(224, 224, 3)
init_input
(224, 224, 3)
。 IMG_SIZE = 224
, context_pixels = 16
這是一個額外的例子:
似乎地面實況盒的大小是正確的,但位置是關閉的。
我已經更新了上面的代碼部分。 比例因子似乎是解決問題的錯誤方法。 通過添加/減去要放大的像素數量,我已經接近了很多。 我相信現在與插值有關,所以如果有人可以幫忙解決這個問題,那將是一個很大的幫助。
新例子:
提供了一種解決方案 。
我的問題在這篇文章中被名為@lenik的用戶解決了。
在將比例因子應用於地面實況框g
的像素坐標之前,必須首先減去零偏移,使x1, y1
變為0, 0
。 這允許縮放工作正常。
因此,變換后任意隨機點(x,y)
的坐標可以計算為:
x_new = (x - x1) * IMG_SIZE / (x2 - x1)
y_new = (y - y1) * IMG_SIZE / (y2 - y1)
在代碼中以及與我的問題相關的解決方案如下:
def next_state(init_input, b_prime, g):
"""
Returns the observable region of the next state.
Formats the next state's observable region, defined
by b_prime, to be of dimension (224, 224, 3). Adding 16
additional pixels of context around the original bounding box.
The ground truth box must be reformatted according to the
new observable region.
:param init_input:
The initial input volume of the current episode.
:param b_prime:
The subsequent state's bounding box.
:param g:
The ground truth box of the target object.
"""
# Determine the pixel coordinates of the observable region for the following state
context_pixels = 16
x1 = max(b_prime[0] - context_pixels, 0)
y1 = max(b_prime[1] - context_pixels, 0)
x2 = min(b_prime[2] + context_pixels, IMG_SIZE)
y2 = min(b_prime[3] + context_pixels, IMG_SIZE)
# Determine observable region
observable_region = cv2.resize(init_input[y1:y2, x1:x2], (224, 224), interpolation=cv2.INTER_AREA)
# Resize ground truth box
g[0] = int((g[0] - x1) * IMG_SIZE / (x2 - x1)) # x1
g[1] = int((g[1] - y1) * IMG_SIZE / (y2 - y1)) # y1
g[2] = int((g[2] - x1) * IMG_SIZE / (x2 - x1)) # x2
g[3] = int((g[3] - y1) * IMG_SIZE / (y2 - y1)) # y2
return observable_region, g
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.