在使用 numpy unique 計數時通過避免 python for cycle 來提高性能

Question

我有兩個 numpy 數組， A的形狀為(N,3) ，B 的形狀為(N,) ，我從向量 A 生成具有唯一條目的向量，例如：

A = np.array([[1.,2.,3.],
              [4.,5.,6.],
              [1.,2.,3.],
              [7.,8.,9.]])

B = np.array([10.,33.,15.,17.])

AUnique, directInd, inverseInd, counts = np.unique(A, 
                                             return_index = True, 
                                             return_inverse = True, 
                                             return_counts = True, 
                                             axis = 0)

所以AUnique將是array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])

然后我得到simil矢量B關聯到AUnique ，並且對於每個非唯一行中A我總結的相關聯的值B在該載體中，即：

BNew = B[directInd] 

# here BNew is [10., 33.,17]

for Id in np.asarray(counts>1).nonzero()[0]: 
  BNew[Id] = np.sum(B[inverseInd == Id])

# here BNew is [25., 33.,17]

問題是對於大 N 個向量（數百萬或數千萬行），for 循環變得非常慢，我想知道是否有辦法避免循環和/或使代碼更快。

提前致謝！

Answer 1

我認為你可以用np.bincount做你想做的np.bincount

BNew = np.bincount(inverseInd, weights = B)
BNew

Out[]: array([25., 33., 17.])

在使用 numpy unique 計數時通過避免 python for cycle 來提高性能

問題描述

1 個解決方案

解決方案1
1 已采納 2020-01-14 13:40:38

在使用 numpy unique 計數時通過避免 python for cycle 來提高性能

問題描述

1 個解決方案

解決方案1 1 已采納 2020-01-14 13:40:38

解決方案1
1 已采納 2020-01-14 13:40:38