NumPy：對每個ndarray元素執行函數

Question

我有一個2D坐標的三維ndarray，例如：

[[[1704 1240]
  [1745 1244]
  [1972 1290]
  [2129 1395]
  [1989 1332]]

 [[1712 1246]
  [1750 1246]
  [1964 1286]
  [2138 1399]
  [1989 1333]]

 [[1721 1249]
  [1756 1249]
  [1955 1283]
  [2145 1399]
  [1990 1333]]]

最終目標是從5個坐標的每個“組”中移除最接近給定點（[1989 1332]）的點。 我的想法是生成一個類似形狀的距離數組，然后使用argmin來確定要刪除的值的索引。 但是，我不確定如何應用一個函數，比如計算到給定點的距離，到ndarray中的每個元素，至少以NumPythonic方式。

Answer 1

列表推導是處理numpy數組的一種非常低效的方法。 它們是距離計算的一個特別糟糕的選擇。

要找到數據和點之間的差異，您只需要做data - point 。 然后，您可以使用np.hypot計算距離，或者如果您願意，可以將其平方，求和，並取平方根。

如果為了計算的目的而使它成為Nx2陣列會更容易一些。

基本上，你想要這樣的東西：

import numpy as np

data = np.array([[[1704, 1240],
                  [1745, 1244],
                  [1972, 1290],
                  [2129, 1395],
                  [1989, 1332]],

                 [[1712, 1246],
                  [1750, 1246],
                  [1964, 1286],
                  [2138, 1399],
                  [1989, 1333]],

                 [[1721, 1249],
                  [1756, 1249],
                  [1955, 1283],
                  [2145, 1399],
                  [1990, 1333]]])

point = [1989, 1332]

#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)

# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
print dist

這會產生：

array([[[ 299.48121811],
        [ 259.38388539],
        [  45.31004304],
        [ 153.5219854 ],
        [   0.        ]],

       [[ 290.04310025],
        [ 254.0019685 ],
        [  52.35456045],
        [ 163.37074401],
        [   1.        ]],

       [[ 280.55837182],
        [ 247.34186868],
        [  59.6405902 ],
        [ 169.77926846],
        [   1.41421356]]])

現在，刪除最接近的元素比簡單地獲取最接近的元素要困難一些。

使用numpy，您可以使用布爾索引來相當容易地執行此操作。

但是，您需要擔心軸的對齊。

關鍵是要了解沿最后一個軸的numpy“廣播”操作。 在這種情況下，我們希望沿着中軸進行brodcast。

此外， -1可用作軸大小的占位符。 當輸入-1作為軸的大小時，Numpy將計算允許的大小。

我們需要做的事情看起來有點像這樣：

#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]

# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])

你可以把它改成一行，我只是將其分解為可讀性。 關鍵是dist != something產生一個布爾數組，然后您可以使用它來索引原始數組。

所以，把它們放在一起：

import numpy as np

data = np.array([[[1704, 1240],
                  [1745, 1244],
                  [1972, 1290],
                  [2129, 1395],
                  [1989, 1332]],

                 [[1712, 1246],
                  [1750, 1246],
                  [1964, 1286],
                  [2138, 1399],
                  [1989, 1333]],

                 [[1721, 1249],
                  [1756, 1249],
                  [1955, 1283],
                  [2145, 1399],
                  [1990, 1333]]])

point = [1989, 1332]

#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)

# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)

#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]

# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])

print filtered

產量：

array([[[1704, 1240],
        [1745, 1244],
        [1972, 1290],
        [2129, 1395]],

       [[1712, 1246],
        [1750, 1246],
        [1964, 1286],
        [2138, 1399]],

       [[1721, 1249],
        [1756, 1249],
        [1955, 1283],
        [2145, 1399]]])

另外，如果多於一個點同樣接近，則不起作用。 Numpy數組必須在每個維度上具有相同數量的元素，因此在這種情況下您需要重新進行分組。

Answer 2

如果我理解你的問題，我認為你正在尋找apply_along_axis 。 使用numpy的內置廣播，我們可以簡單地從數組中減去該點：

>>> a - numpy.array([1989, 1332])
array([[[-285,  -92],
        [-244,  -88],
        [ -17,  -42],
        [ 140,   63],
        [   0,    0]],

       [[-277,  -86],
        [-239,  -86],
        [ -25,  -46],
        [ 149,   67],
        [   0,    1]],

       [[-268,  -83],
        [-233,  -83],
        [ -34,  -49],
        [ 156,   67],
        [   1,    1]]])

然后我們可以將numpy.linalg.norm應用於它：

>>> dist = a - numpy.array([1989, 1332])
>>> numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
array([[ 299.48121811,  259.38388539,   45.31004304,  
         153.5219854 ,    0.        ],
       [ 290.04310025,  254.0019685 ,   52.35456045,  
         163.37074401,    1.        ],
       [ 280.55837182,  247.34186868,   59.6405902 ,  
         169.77926846,    1.41421356]])

最后，一些布爾掩碼技巧，以及幾個reshape調用：

>>> a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
array([[[1704, 1240],
        [1745, 1244],
        [1972, 1290],
        [2129, 1395]],

       [[1712, 1246],
        [1750, 1246],
        [1964, 1286],
        [2138, 1399]],

       [[1721, 1249],
        [1756, 1249],
        [1955, 1283],
        [2145, 1399]]])

Joe Kington的答案雖然更快。 那好吧。 我會把它留給子孫后代。

def joes(data, point):
    dist = data.reshape((-1,2)) - point
    dist = np.hypot(*dist.T)
    dist = dist.reshape(data.shape[0], data.shape[1], 1)
    mask = np.squeeze(dist) != dist.min(axis=1)
    return data[mask].reshape((3, 4, 2))

def mine(a, point):
    dist = a - point
    normed = numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
    return a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))

>>> %timeit mine(data, point)
1000 loops, best of 3: 586 us per loop
>>> %timeit joes(data, point)
10000 loops, best of 3: 48.9 us per loop

Answer 3

有多種方法可以做到這一點，但這里有一個使用列表推導：

距離函數：

In [35]: from numpy.linalg import norm

In [36]: dist = lambda x,y:norm(x-y)

輸入數據：

In [39]: GivenMatrix = scipy.rand(3, 5, 2)

In [40]: GivenMatrix
Out[40]: 
array([[[ 0.83798666,  0.90294439],
        [ 0.8706959 ,  0.88397176],
        [ 0.91879085,  0.93512921],
        [ 0.15989245,  0.57311869],
        [ 0.82896003,  0.53589968]],

       [[ 0.0207089 ,  0.9521768 ],
        [ 0.94523963,  0.31079109],
        [ 0.41929482,  0.88559614],
        [ 0.87885236,  0.45227422],
        [ 0.58365369,  0.62095507]],

       [[ 0.14757177,  0.86101539],
        [ 0.58081214,  0.12632764],
        [ 0.89958321,  0.73660852],
        [ 0.3408943 ,  0.45420989],
        [ 0.42656333,  0.42770216]]])

In [41]: q = scipy.rand(2)

In [42]: q
Out[42]: array([ 0.03280889,  0.71057403])

計算輸出距離：

In [44]: distances = [[dist(x, q) for x in SubMatrix] 
                      for SubMatrix in GivenMatrix]

In [45]: distances
Out[45]: 
[[0.82783910695733931,
  0.85564093542511577,
  0.91399620574915652,
  0.18720096539588818,
  0.81508758596405939],
 [0.24190557184498068,
  0.99617079746515047,
  0.42426891258164884,
  0.88459501973012633,
  0.55808740166908177],
 [0.18921712490174292,
  0.80103146210692744,
  0.86716521557255788,
  0.40079819635686459,
  0.48482888965287363]]

要對每個子矩陣的結果進行排名：

In [46]: scipy.argsort(distances)
Out[46]: 
array([[3, 4, 0, 1, 2],
       [0, 2, 4, 3, 1],
       [0, 3, 4, 1, 2]])

至於刪除，我個人認為通過將GivenMatrix轉換為list然后使用del來最簡單：

>>> GivenList = GivenMatrix.tolist()

>>> del GivenList[1][2] # delete third row from the second 5-by-2 submatrix

NumPy：對每個ndarray元素執行函數

問題描述

3 個解決方案

解決方案1
4 已采納 2012-06-16 02:04:19

解決方案2
1 2012-06-16 02:20:24

解決方案3
0 2012-06-15 23:51:03

NumPy：對每個ndarray元素執行函數

問題描述

3 個解決方案

解決方案1 4 已采納 2012-06-16 02:04:19

解決方案2 1 2012-06-16 02:20:24

解決方案3 0 2012-06-15 23:51:03

解決方案1
4 已采納 2012-06-16 02:04:19

解決方案2
1 2012-06-16 02:20:24

解決方案3
0 2012-06-15 23:51:03