简体   繁体   English

遍历大型稀疏数组

[英]Looping over large sparse array

Let's say I have a (sparse) matrix M size (N*N, N*N) . 假设我有一个(稀疏的)矩阵M大小(N*N, N*N) I want to select elements from this matrix where the outer product of grid (a (n,m) array, where n*m=N ) is True (it is a boolean 2D array, and na=grid.sum() ). 我想从此矩阵中选择元素,其中grid (a (n,m)数组,其中n*m=N )的外部乘积为True(这是布尔2D数组,并且na=grid.sum() )。 This can be done as follows 可以如下进行

result = M[np.outer( grid.flatten(),grid.flatten() )].reshape (( N, N ) )

result is an (na,na) sparse array (and na < N ). result是一个(na,na)稀疏数组(且na < N )。 The previous line is what I want to achieve: get the elements of M that are true from the product of grid , and squeeze the ones that aren't true out of the array. 前一行是我要实现的:从grid的乘积中获得M的真实元素,并将不真实的元素从数组中挤出。

As n and m (and hence N ) grow, and M and result are sparse matrices, I am not able to do this efficiently in terms of memory or speed. 随着nm (以及因此N )的增长,以及Mresult是稀疏矩阵,就内存或速度而言,我无法高效地做到这一点。 Closest I have tried is: 我尝试过的最接近的是:

result = sp.lil_matrix ( (1, N*N), dtype=np.float32 )
# Calculate outer product
A = np.einsum("i,j", grid.flatten(), grid.flatten())  
cntr = 0
it = np.nditer ( A, flags=['multi_index'] )
while not it.finished:
    if it[0]:
        result[0,cntr] = M[it.multi_index[0], it.multi_index[1]]
        cntr += 1
# reshape result to be a N*N sparse matrix

The last reshape could be done by this approach , but I haven't got there yet, as the while loop is taking forever. 最后的重塑可以通过这种方法完成,但是我还没有到那儿,因为while循环将永远占据一席之地。

I have also tried selecting nonzero elements of A too, and looping over but this eats up all of the memory: 我也尝试过选择A的非零元素,并进行循环,但这会占用所有内存:

A=np.einsum("i,j", grid.flatten(), grid.flatten())  
nzero = A.nonzero() # This eats lots of memory
cntr = 0
for (i,j) in zip (*nzero):
    temp_mat[0,cntr] = M[i,j]
    cnt += 1

'n' and 'm' in the example above are around 300. 上例中的“ n”和“ m”约为300。

I don't know if it was a typo, or code error, but your example is missing an iternext : 我不知道这是拼写错误还是代码错误,但是您的示例缺少了iternext

R=[]
it = np.nditer ( A, flags=['multi_index'] )
while not it.finished:
    if it[0]:
        R.append(M[it.multi_index])
    it.iternext()

I think appending to a list is simpler and faster than R[ctnr]=... . 我认为追加到列表比R[ctnr]=...更简单,更快。 It's competitive if R is a regular array, and sparse indexing is slower (even the fastest lil format). 如果R是规则数组,并且稀疏索引的速度较慢(甚至是最快的lil格式),也很有竞争力。

ndindex wraps this use of a nditer as: ndindex使用包装为:

R=[]
for index in np.ndindex(A.shape):
    if A[index]:
        R.append(M[index])

ndenumerate also works: ndenumerate也可以工作:

R = []
for index,a in np.ndenumerate(A):
   if a:
       R.append(M[index])

But I wonder if you really want to advance the cntr each it step, not just the True cases. 但我不知道,如果你真的想提前cntr每次it一步,不只是True的情况。 Otherwise reshaping result to (N,N) doesn't make much sense. 否则,将result重塑为(N,N)并没有多大意义。 But in that case, isn't your problem just 但是在那种情况下,你的问题不只是

M[:N, :N].multiply(A)

or if M was a dense array: 或者M是一个密集数组:

M[:N, :N]*A

In fact if both M and A are sparse, then the .data attribute of that multiply will be the same as the R list. 实际上,如果MA都是稀疏的,则该multiply.data属性将与R列表相同。

In [76]: N=4
In [77]: M=np.arange(N*N*N*N).reshape(N*N,N*N)
In [80]: a=np.array([0,1,0,1])
In [81]: A=np.einsum('i,j',a,a)
In [82]: A
Out[82]: 
array([[0, 0, 0, 0],
       [0, 1, 0, 1],
       [0, 0, 0, 0],
       [0, 1, 0, 1]])

In [83]: M[:N, :N]*A
Out[83]: 
array([[ 0,  0,  0,  0],
       [ 0, 17,  0, 19],
       [ 0,  0,  0,  0],
       [ 0, 49,  0, 51]])

In [84]: c=sparse.csr_matrix(M)[:N,:N].multiply(sparse.csr_matrix(A))
In [85]: c.data
Out[85]: array([17, 19, 49, 51], dtype=int32)

In [89]: [M[index] for index, a in np.ndenumerate(A) if a]
Out[89]: [17, 19, 49, 51]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM