numpy數組中的索引，其中另一個數組中的切片

Question

實際問題出現在某些機器學習應用程序中，數據有點復雜。 所以這是一個能夠捕捉問題本質的MWE：

我有兩個陣列如下：

L = np.arange(12).reshape(4,3)
M = np.arange(12).reshape(6,2)

現在，我想在L找到行R，這樣在M中存在一些由R中除了最后一個元素之外的所有元素組成的行。

從上面的示例代碼中， L和M看起來像這樣：

array([[ 0,  1,  2],  # L
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

array([[ 0,  1],  # M
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

我想從這些， L標記的行，作為一個numpy數組：

array([[ 0,  1,  2],
       [ 6,  7,  8]])

如果我將L和M表示為python列表，我會這樣做：

L = [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]
M = [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11]]
answer = [R for R in L if R[:-1] in M]

現在，我知道我可以在numpy中使用類似的列表理解並將結果轉換為數組，numpy就像它一樣令人敬畏，可能有更優雅的方式來做我不知道的事情。

我試着查看np.where （獲取所需的索引，然后我可以用它來接收L），但這似乎沒有做我需要的。

我很感激任何幫助

Answer 1

好的，我想我明白了。 訣竅是向M添加另一個維度，然后您可以使用廣播：

M.shape += (1,)
E = np.all(L[:,:-1].T == M, 1)

你得到一個6x4布爾矩陣E ，它給你比較L的所有行和M的所有行的結果。

從這里很容易完成：

result = L[np.any(E,0)]

這樣簡化了解決方案，您不需要任何lambda函數或“隱式循環”（例如np.apply_along_axis() ）。

是的，numpy矢量化是美麗的（但有時你必須認為很抽象）...

Answer 2

與Bitwise的答案非常相似：

def fn(a):
    return lambda b: np.all(a==b, axis=1)
matches = np.apply_along_axis(fn(M), 1, L[:,:2])
result = L[np.any(matches, axis=1)]

引擎蓋下發生的事情是這樣的（我將使用Bitwise的例子，這更容易證明）：

>>> M
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])
>>> M.shape+=(1,)
>>> M
array([[[ 0],
        [ 1]],

       [[ 2],
        [ 3]],

       [[ 4],
        [ 5]],

       [[ 6],
        [ 7]],

       [[ 8],
        [ 9]],

       [[10],
        [11]]])

這里我們為M數組添加了另一個維度，現在是（6,2,1）。

>>> L2 = L[:,:-1].T

然后我們擺脫2的最后一列，並轉置數組，使尺寸為（2,4）

這里是神奇的，M和L2現在可以播放到維度陣列（6,2,4）。

正如numpy的文檔所述：

如果上述規則產生有效結果，則將一組數組稱為“可廣播”到相同的形狀，即滿足下列條件之一：
 The arrays all have exactly the same shape. The arrays all have the same number of dimensions and the length of each dimensions is either a common length or 1. The arrays that have too few dimensions can have their shapes prepended with a dimension of length 1 to satisfy property 2. 
例

如果a.shape是（5,1），b.shape是（1,6），c.shape是（6，），d.shape是（），所以d是標量，然后是a，b，c，和d都可以播放到維度（5,6）; 和
 a acts like a (5,6) array where a[:,0] is broadcast to the other columns, b acts like a (5,6) array where b[0,:] is broadcast to the other rows, c acts like a (1,6) array and therefore like a (5,6) array where c[:] is broadcast to every row, and finally, d acts like a (5,6) array where the single value is repeated. 

M [：，：0]將重復4次以填充3 dim，並且L2將被添加到新維度並重復6次以填充它。

>>> B = np.broadcast_arrays(L2,M)
>>> B
[array([[[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]]]),


array([[[ 0,  0,  0,  0],
        [ 1,  1,  1,  1]],

       [[ 2,  2,  2,  2],
        [ 3,  3,  3,  3]],

       [[ 4,  4,  4,  4],
        [ 5,  5,  5,  5]],

       [[ 6,  6,  6,  6],
        [ 7,  7,  7,  7]],

       [[ 8,  8,  8,  8],
        [ 9,  9,  9,  9]],

       [[10, 10, 10, 10],
        [11, 11, 11, 11]]])]

我們現在可以按元素比較它們：

>>> np.equal(*B)
array([[[ True, False, False, False],
        [ True, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False,  True, False],
        [False, False,  True, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]]], dtype=bool)

行到行（軸= 1）：

>>> np.all(np.equal(*B), axis=1)
array([[ True, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False,  True, False],
       [False, False, False, False],
       [False, False, False, False]], dtype=bool)

聚合L'：

>>> C = np.any(np.all(np.equal(*B), axis=1), axis=0)
>>> C
array([ True, False,  True, False], dtype=bool)

這將為您提供應用於L的布爾掩碼。

>>> L[C]
array([[0, 1, 2],
       [6, 7, 8]])

apply_along_axis將利用相同的功能，但減少L's維度而不是增加M（因此添加隱式循環）。

Answer 3

>>> import hashlib
>>> fn = lambda xs: hashlib.sha1(xs).hexdigest()
>>> m = np.apply_along_axis(fn, 1, M)
>>> l = np.apply_along_axis(fn, 1, L[:,:-1])
>>> L[np.in1d(l, m)]
array([[0, 1, 2],
       [6, 7, 8]])

Answer 4

>>> print np.array([row for row in L if row[:-1] in M])
[[0 1 2]
 [6 7 8]]

numpy數組中的索引，其中另一個數組中的切片

問題描述

4 個解決方案

解決方案1
4 2014-03-22 22:19:50

解決方案2
3 2014-03-22 23:19:28

解決方案3
0 已采納 2014-03-22 22:29:42

解決方案4
0 2014-03-22 23:05:35

numpy數組中的索引，其中另一個數組中的切片

問題描述

4 個解決方案

解決方案1 4 2014-03-22 22:19:50

解決方案2 3 2014-03-22 23:19:28

解決方案3 0 已采納 2014-03-22 22:29:42

解決方案4 0 2014-03-22 23:05:35

解決方案1
4 2014-03-22 22:19:50

解決方案2
3 2014-03-22 23:19:28

解決方案3
0 已采納 2014-03-22 22:29:42

解決方案4
0 2014-03-22 23:05:35