繁体   English   中英

某些向量元素的快速(稀疏)矩阵构造

[英]Quick (sparse) matrix construction by certain vector elements

我有两个 numpy 向量ab并希望通过以下方式构造一个(理想稀疏的)矩阵C

C[i,j] = (a[j]>b[j])*(a[i]+b[j])

因此,只有那些满足 j 索引之间特定关系的元素才会使用其他一些元素来计算。 同样,这很容易通过循环 i & j,但我想知道是否有更有效的“numpy/scipy”方式? 简而言之,我不知道如何处理所涉及指数的差异。

提前致谢!

我应该坚持[MCVE],但我会做一个

到同样大小的arrays:

In [232]: a = np.random.randint(0,10,10)
In [233]: b = np.random.randint(0,10,10)

计算的a[j]>b[j]部分:

In [234]: a>b
Out[234]: 
array([ True, False, False, False, False,  True, False, False, False,
       False])

求和部分, a[i]+b[j]

In [235]: a[:,None]+b
Out[235]: 
array([[ 9,  9, 10, 14, 13, 11, 11, 12,  9, 14],
       [ 4,  4,  5,  9,  8,  6,  6,  7,  4,  9],
       [ 7,  7,  8, 12, 11,  9,  9, 10,  7, 12],
       [10, 10, 11, 15, 14, 12, 12, 13, 10, 15],
       [12, 12, 13, 17, 16, 14, 14, 15, 12, 17],
       [12, 12, 13, 17, 16, 14, 14, 15, 12, 17],
       [ 6,  6,  7, 11, 10,  8,  8,  9,  6, 11],
       [11, 11, 12, 16, 15, 13, 13, 14, 11, 16],
       [ 7,  7,  8, 12, 11,  9,  9, 10,  7, 12],
       [12, 12, 13, 17, 16, 14, 14, 15, 12, 17]])

将所需的列设置为 0:

In [236]: _[:,np.nonzero(a>b)] = 0
In [237]: _
Out[237]: 
array([[ 0,  9, 10, 14, 13,  0, 11, 12,  9, 14],
       [ 0,  4,  5,  9,  8,  0,  6,  7,  4,  9],
       [ 0,  7,  8, 12, 11,  0,  9, 10,  7, 12],
       [ 0, 10, 11, 15, 14,  0, 12, 13, 10, 15],
       [ 0, 12, 13, 17, 16,  0, 14, 15, 12, 17],
       [ 0, 12, 13, 17, 16,  0, 14, 15, 12, 17],
       [ 0,  6,  7, 11, 10,  0,  8,  9,  6, 11],
       [ 0, 11, 12, 16, 15,  0, 13, 14, 11, 16],
       [ 0,  7,  8, 12, 11,  0,  9, 10,  7, 12],
       [ 0, 12, 13, 17, 16,  0, 14, 15, 12, 17]])

糟糕,我已经切换了,我应该将其他列设置为 0。

但我们不必在单独的步骤中执行此操作:

In [238]: (a>b)*(a[:,None]+b)
Out[238]: 
array([[ 9,  0,  0,  0,  0, 11,  0,  0,  0,  0],
       [ 4,  0,  0,  0,  0,  6,  0,  0,  0,  0],
       [ 7,  0,  0,  0,  0,  9,  0,  0,  0,  0],
       [10,  0,  0,  0,  0, 12,  0,  0,  0,  0],
       [12,  0,  0,  0,  0, 14,  0,  0,  0,  0],
       [12,  0,  0,  0,  0, 14,  0,  0,  0,  0],
       [ 6,  0,  0,  0,  0,  8,  0,  0,  0,  0],
       [11,  0,  0,  0,  0, 13,  0,  0,  0,  0],
       [ 7,  0,  0,  0,  0,  9,  0,  0,  0,  0],
       [12,  0,  0,  0,  0, 14,  0,  0,  0,  0]])

这种计算在很大程度上取决于broadcasting 如果您对此不熟悉,那将没有多大意义。

arrays:

In [239]: a
Out[239]: array([5, 0, 3, 6, 8, 8, 2, 7, 3, 8])
In [240]: b
Out[240]: array([4, 4, 5, 9, 8, 6, 6, 7, 4, 9])

在进行整个阵列numpy计算时,很多(或只是一些)列将为 0 的事实并没有太大区别。 通过整个 arrays 通常更简单、更快。

您可以对列进行迭代,然后对选定的列进行添加。 无需迭代行。 但通常这会比 [238] 慢:

In [245]: c =  np.zeros((10,10),int)
     ...: for j in range(10):
     ...:     if a[j]>b[j]:
     ...:         c[:,j] = a + b[j]

或将比较移出循环:

In [249]: c =  np.zeros((10,10),int)
     ...: m = a>b
     ...: for j,v in enumerate(m):
     ...:     if v:
     ...:         c[:,j] = a + b[j]

更好的是:

In [253]: c =  np.zeros((10,10),int)
     ...: idx = np.nonzero(a>b)[0]
     ...: print(idx)
     ...: c[:,idx] = a[:,None] + b[idx]

对于某种程度的稀疏性,最后一个可能更快,但不是在这里:

In [256]: %%timeit
     ...: c =  np.zeros((10,10),int)
     ...: idx = np.nonzero(a>b)[0]
     ...: c[:,idx] = a[:,None] + b[idx]
17.7 µs ± 772 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [258]: timeit (a>b)*(a[:,None]+b)
10.7 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

使用循环没有错。 您可以考虑一个属性:条件a[j]>b[j]告诉我们整列是否可以为零,所以如果我们使用scipy.sparse.coo_matrix((data,(i,j)))在列上进行外循环是有意义的:

data=[]
i=[]
j=[]
for jx = ...:
    if a[j]>b[j]:
        for ix = ...: # this whole loop can be skipped if the condition is not satisfied
            data.append(a[i]+a[j])
            i.append(ix)
            j.append(jx)
        

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM