[英]Quick (sparse) matrix construction by certain vector elements
我有两个 numpy 向量a和b并希望通过以下方式构造一个(理想稀疏的)矩阵C :
C[i,j] = (a[j]>b[j])*(a[i]+b[j])
因此,只有那些满足 j 索引之间特定关系的元素才会使用其他一些元素来计算。 同样,这很容易通过循环 i & j,但我想知道是否有更有效的“numpy/scipy”方式? 简而言之,我不知道如何处理所涉及指数的差异。
提前致谢!
我应该坚持[MCVE],但我会做一个
到同样大小的arrays:
In [232]: a = np.random.randint(0,10,10)
In [233]: b = np.random.randint(0,10,10)
计算的a[j]>b[j]
部分:
In [234]: a>b
Out[234]:
array([ True, False, False, False, False, True, False, False, False,
False])
求和部分, a[i]+b[j]
:
In [235]: a[:,None]+b
Out[235]:
array([[ 9, 9, 10, 14, 13, 11, 11, 12, 9, 14],
[ 4, 4, 5, 9, 8, 6, 6, 7, 4, 9],
[ 7, 7, 8, 12, 11, 9, 9, 10, 7, 12],
[10, 10, 11, 15, 14, 12, 12, 13, 10, 15],
[12, 12, 13, 17, 16, 14, 14, 15, 12, 17],
[12, 12, 13, 17, 16, 14, 14, 15, 12, 17],
[ 6, 6, 7, 11, 10, 8, 8, 9, 6, 11],
[11, 11, 12, 16, 15, 13, 13, 14, 11, 16],
[ 7, 7, 8, 12, 11, 9, 9, 10, 7, 12],
[12, 12, 13, 17, 16, 14, 14, 15, 12, 17]])
将所需的列设置为 0:
In [236]: _[:,np.nonzero(a>b)] = 0
In [237]: _
Out[237]:
array([[ 0, 9, 10, 14, 13, 0, 11, 12, 9, 14],
[ 0, 4, 5, 9, 8, 0, 6, 7, 4, 9],
[ 0, 7, 8, 12, 11, 0, 9, 10, 7, 12],
[ 0, 10, 11, 15, 14, 0, 12, 13, 10, 15],
[ 0, 12, 13, 17, 16, 0, 14, 15, 12, 17],
[ 0, 12, 13, 17, 16, 0, 14, 15, 12, 17],
[ 0, 6, 7, 11, 10, 0, 8, 9, 6, 11],
[ 0, 11, 12, 16, 15, 0, 13, 14, 11, 16],
[ 0, 7, 8, 12, 11, 0, 9, 10, 7, 12],
[ 0, 12, 13, 17, 16, 0, 14, 15, 12, 17]])
糟糕,我已经切换了,我应该将其他列设置为 0。
但我们不必在单独的步骤中执行此操作:
In [238]: (a>b)*(a[:,None]+b)
Out[238]:
array([[ 9, 0, 0, 0, 0, 11, 0, 0, 0, 0],
[ 4, 0, 0, 0, 0, 6, 0, 0, 0, 0],
[ 7, 0, 0, 0, 0, 9, 0, 0, 0, 0],
[10, 0, 0, 0, 0, 12, 0, 0, 0, 0],
[12, 0, 0, 0, 0, 14, 0, 0, 0, 0],
[12, 0, 0, 0, 0, 14, 0, 0, 0, 0],
[ 6, 0, 0, 0, 0, 8, 0, 0, 0, 0],
[11, 0, 0, 0, 0, 13, 0, 0, 0, 0],
[ 7, 0, 0, 0, 0, 9, 0, 0, 0, 0],
[12, 0, 0, 0, 0, 14, 0, 0, 0, 0]])
这种计算在很大程度上取决于broadcasting
。 如果您对此不熟悉,那将没有多大意义。
arrays:
In [239]: a
Out[239]: array([5, 0, 3, 6, 8, 8, 2, 7, 3, 8])
In [240]: b
Out[240]: array([4, 4, 5, 9, 8, 6, 6, 7, 4, 9])
在进行整个阵列numpy
计算时,很多(或只是一些)列将为 0 的事实并没有太大区别。 通过整个 arrays 通常更简单、更快。
您可以对列进行迭代,然后对选定的列进行添加。 无需迭代行。 但通常这会比 [238] 慢:
In [245]: c = np.zeros((10,10),int)
...: for j in range(10):
...: if a[j]>b[j]:
...: c[:,j] = a + b[j]
或将比较移出循环:
In [249]: c = np.zeros((10,10),int)
...: m = a>b
...: for j,v in enumerate(m):
...: if v:
...: c[:,j] = a + b[j]
更好的是:
In [253]: c = np.zeros((10,10),int)
...: idx = np.nonzero(a>b)[0]
...: print(idx)
...: c[:,idx] = a[:,None] + b[idx]
对于某种程度的稀疏性,最后一个可能更快,但不是在这里:
In [256]: %%timeit
...: c = np.zeros((10,10),int)
...: idx = np.nonzero(a>b)[0]
...: c[:,idx] = a[:,None] + b[idx]
17.7 µs ± 772 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [258]: timeit (a>b)*(a[:,None]+b)
10.7 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
使用循环没有错。 您可以考虑一个属性:条件a[j]>b[j]
告诉我们整列是否可以为零,所以如果我们使用scipy.sparse.coo_matrix((data,(i,j)))
在列上进行外循环是有意义的:
data=[]
i=[]
j=[]
for jx = ...:
if a[j]>b[j]:
for ix = ...: # this whole loop can be skipped if the condition is not satisfied
data.append(a[i]+a[j])
i.append(ix)
j.append(jx)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.