滑動窗口-如何獲取窗口在圖像上的位置？

Question

在python中提到這個很棒的滑動窗口實現： https : //github.com/keepitsimple/ocrtest/blob/master/sliding_window.py#blob_contributors_box ，我的問題是-在代碼中我可以實際看到當前窗口的位置在圖像上？ 還是我該如何抓住它的位置？

在第72行和第85行之后，我嘗試打印出shape和newstrides ，但是顯然我在這里什么都沒得到。 在norm_shape函數中，我打印出tuple但輸出僅為窗口尺寸（如果我也理解正確的話）。

但是我不僅需要尺寸，例如寬度和高度，還需要知道從像素中准確提取窗口的位置，像素坐標或圖像中的哪些行/列。

Answer 1

如果嘗試使用flatten=False在圖像上創建“網格”窗口，則可能更容易理解發生了什么：

import numpy as np
from scipy.misc import lena
from matplotlib import pyplot as plt

img = lena()
print(img.shape)
# (512, 512)

# make a 64x64 pixel sliding window on img. 
win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

print(win.shape)
# (8, 8, 64, 64)
# i.e. (img_height / win_height, img_width / win_width, win_height, win_width)

plt.imshow(win[4, 4, ...])
plt.draw()
# grid position [4, 4] contains Lena's eye and nose

要獲取相應的像素坐標，可以執行以下操作：

def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right

# check for grid position [3, 4]
t, b, l, r = get_win_pixel_coords((3, 4), (64, 64))

print(np.all(img[t:b, l:r] == win[3, 4, :, :]))
# True

如果使用flatten=True ，則將64x64像素窗口的8x8網格flatten=True為64x64像素窗口的64長向量。 在這種情況下，您可以使用類似np.unravel_index方法將一維矢量索引轉換為網格索引的元組，然后使用它們來獲取如上所述的像素坐標：

win = sliding_window(img, (64, 64), flatten=True)

grid_pos = np.unravel_index(12, (8, 8))
t, b, l, r = get_win_pixel_coords(grid_pos, (64, 64))

print(np.all(img[t:b, l:r] == win[12]))
# True

好的，我將嘗試解決您在評論中提出的一些問題。

我想要窗口的像素位置相對於原始圖像的實際像素尺寸。

也許我還不夠清楚-您已經可以使用諸如get_win_pixel_coords()函數之類的get_win_pixel_coords()來執行此操作，該函數為您提供窗口相對於圖像的頂部，底部，左側和右側坐標。 例如：

win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.hold(True)
ax1.imshow(win[4, 4])
ax1.plot(8, 9, 'oy')         # position of Lena's eye, relative to this window

t, b, l, r = get_win_pixel_coords((4, 4), (64, 64))

ax2.hold(True)
ax2.imshow(img)
ax2.plot(t + 8, l + 9, 'oy') # position of Lena's eye, relative to whole image

plt.show()

還要注意，我已經更新了get_win_pixel_coords()來處理shiftSize不為None （即，窗口無法完美地平鋪沒有重疊的圖像）。

因此，我猜測在那種情況下，我應該使網格等於原始圖像的尺寸，對嗎？ （而不是使用8x8）。

不，如果窗口平鋪圖像而沒有重疊（即shiftSize=None我假設的shiftSize=None ），那么如果使網格尺寸等於圖像的像素尺寸，則每個窗口將只包含一個像素！

因此，在我的情況下，對於寬度為360，高度為240的圖像，這意味着我將使用以下行： grid_pos = np.unravel_index(*12*, (240, 360)) 。 另外，該行中的12代表什么？

正如我所說，使“網格大小”等於圖像尺寸將是毫無意義的，因為每個窗口將僅包含一個像素（至少，假設這些窗口不重疊）。 12將指向窗口的扁平網格中的索引，例如：

x = np.arange(25).reshape(5, 5)    # 5x5 grid containing numbers from 0 ... 24
x_flat = x.ravel()                 # flatten it into a 25-long vector
print(x_flat[12])                  # the 12th element in the flattened vector
# 12
row, col = np.unravel_index(12, (5, 5))  # corresponding row/col index in x
print(x[row, col])
# 12

我將每個窗口移動10個像素，第一個滑動窗口從圖像上的坐標0x0開始，第二個滑動窗口從10x10開始，依此類推，然后我希望程序不僅返回窗口內容，還返回每個坐標窗口，即0,0，然后是10,10，依此類推

如我所說，您已經可以使用get_win_pixel_coords()返回的頂部，底部，左側，右側坐標來獲取窗口相對於圖像的位置。 如果您確實需要，可以將其包裝為一個函數：

def get_pixels_and_coords(win_grid, grid_pos):
    pix = win_grid[grid_pos]
    tblr = get_win_pixel_coords(grid_pos, pix.shape)
    return pix, tblr

# e.g.:
pix, tblr = get_pixels_and_coords(win, (3, 4))

如果您想要窗口中每個像素相對於圖像的坐標，則可以使用的另一種技巧是構造一個包含圖像中每個像素的行索引和列索引的數組，然后將滑動窗口應用於這些像素：

ridx, cidx = np.indices(img.shape)
r_win = sliding_window(ridx, (64, 64), shiftSize=None, flatten=False)
c_win = sliding_window(cidx, (64, 64), shiftSize=None, flatten=False)

pix = win[3, 4]    # pixel values
r = r_win[3, 4]    # row index of every pixel in the window
c = c_win[3, 4]    # column index of every pixel in the window

Answer 2

要更新@ali_m，因為scipy.misc.lena（）在> 0.17中不再可用。 這是一個使用RGB圖像scipy.misc.face（）的示例，對OP中提供的滑動窗口源代碼進行了少許修改。

import numpy as np
from scipy.misc import ascent, face
from matplotlib import pyplot as plt
from numpy.lib.stride_tricks import as_strided as ast

def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right
def norm_shape(shape):
    '''
    Normalize numpy array shapes so they're always expressed as a tuple,
    even for one-dimensional shapes.
    Parameters
        shape - an int, or a tuple of ints
    Returns
        a shape tuple
    '''
    try:
        i = int(shape)
        return (i,)
    except TypeError:
        # shape was not a number
        pass

    try:
        t = tuple(shape)
        return t
    except TypeError:
        # shape was not iterable
        pass

    raise TypeError('shape must be an int, or a tuple of ints')


def sliding_window(a,ws,ss = None,flatten = True):
    '''
    Return a sliding window over a in any number of dimensions
    '''
    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    ws = norm_shape(ws)
    ss = norm_shape(ss)
    # convert ws, ss, and a.shape to numpy arrays
    ws = np.array(ws)
    ss = np.array(ss)
    shap = np.array(a.shape)
    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shap),len(ws),len(ss)]
    if 1 != len(set(ls)):
        raise ValueError(\
        'a.shape, ws and ss must all have the same length. They were %s' % str(ls))

    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shap):
        raise ValueError(\
        'ws cannot be larger than a in any dimension.\
 a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
    # how many slices will there be in each dimension?
    newshape = norm_shape(((shap - ws) // ss) + 1)
    # the shape of the strided array will be the number of slices in each dimension
    # plus the shape of the window (tuple addition)
    newshape += norm_shape(ws)
    # the strides tuple will be the array's strides multiplied by step size, plus
    # the array's strides (tuple addition)
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
    a = ast(a,shape = newshape,strides = newstrides)
    if not flatten:
        return a
    # Collapse strided so that it has one more dimension than the window.  I.e.,
    # the new array is a flat list of slices.
    meat = len(ws) if ws.shape else 0
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
    dim = firstdim + (newshape[-meat:])
    # remove any dimensions with size 1
    #dim = filter(lambda i : i != 1,dim)
    return a.reshape(dim), newshape

將返回變量newshape添加到newshape sliding_window()可以傳遞flatten=True並且仍然知道由滑動窗口函數創建的網格的性質。 在我的應用程序中，需要一個平坦的計算窗口向量，因為這是擴展您對每個計算窗口應用的計算的好點。

如果將96x96窗口（即tile x tile ）在兩個方向上以50％的重疊率應用於形狀為(768,1024,3)的圖像，則可以填充輸入圖像以確保輸入圖像可被N個窗口整除，而沒有滑動窗口創建之前的剩余時間。

img = face()
nxo,nyo,nzo = img.shape

tile=96 
pad_img = np.vstack((np.hstack((img,np.fliplr(img))),np.flipud(np.hstack((img,np.fliplr(img))))))

pad_img = pad_img[:nxo+(nxo % tile),:nyo+(nyo % tile), :]



win, ind = sliding_window(pad_img, (96, 96,3), (48,48,3))
print(ind)
(15, 21, 1, 96, 96, 3)
print(win.shape)
(315, 96, 96, 3)

計算窗口的網格包含15行21列和315個計算窗口。 grid_pos可以使用從計算窗口的扁平扁平矢量的索引（即，來確定win ）， ind[0]和ind[1] 如果我們對第239個計算窗口感興趣：

grid_pos = np.unravel_index(239,(ind[0],ind[1]))
print(grid_pos1)
#(11, 8)

然后可以使用以下命令找到原始圖像中計算窗口的邊界坐標：

t, b, l, r = get_win_pixel_coords(grid_pos, (96, 96), (48,48))
print(np.all(pad_img[t:b, l:r] == win[239]))
#True

滑動窗口-如何獲取窗口在圖像上的位置？

問題描述

2 個解決方案

解決方案1
3 已采納 2014-12-20 23:19:41

解決方案2
0 2018-09-12 16:31:13

滑動窗口-如何獲取窗口在圖像上的位置？

問題描述

2 個解決方案

解決方案1 3 已采納 2014-12-20 23:19:41

解決方案2 0 2018-09-12 16:31:13

解決方案1
3 已采納 2014-12-20 23:19:41

解決方案2
0 2018-09-12 16:31:13