[英]Most efficient way to index into a numpy array from a scipy CSR matrix?
I have a numpy ndarray X
with shape (4000, 3)
, where each sample in X
is a 3D coordinate (x,y,z). 我有一个形状为
(4000, 3)
4000,3)的numpy ndarray X
,其中X
中的每个样本都是3D坐标(x,y,z)。
I have a scipy csr matrix nn_rad_csr
of shape (4000, 4000)
, which is the nearest neighbors graph generated from sklearn.neighbors.radius_neighbors_graph(X, 0.01, include_self=True)
. 我有一个形状为
(4000, 4000)
的sscipy csr矩阵nn_rad_csr
,它是从sklearn.neighbors.radius_neighbors_graph(X, 0.01, include_self=True)
生成的最近邻居图。
nn_rad_csr.toarray()[i]
is a shape (4000,) sparse vector with binary weights (0 or 1) associated with the edges in the nearest neighbors graph from node X[i]
. nn_rad_csr.toarray()[i]
是形状(4000,)的稀疏矢量,其二进制权重(0或1)与节点X[i]
的最近邻居图中的边关联。
For instance, if nn_rad_csr.toarray()[i][j] == 1
then X[j]
is within the nearest neighbor radius of X[i]
, whereas a value of 0
means it is not a neighbor. 例如,如果
nn_rad_csr.toarray()[i][j] == 1
则X[j]
在X[i]
的最近邻居半径内,而值0
表示它不是邻居。
What I'd like to do is have a function radius_graph_conv(X, rad)
which returns an array Y
which is X
, averaged by its neighbors' values. 我想做的是有一个函数
radius_graph_conv(X, rad)
,该函数返回一个数组Y
,该数组Y
为X
,按其邻居的值平均。 I'm not sure how to exploit the sparsity of a CSR matrix to efficiently perform radius_graph_conv
. 我不确定如何利用CSR矩阵的稀疏性来有效地执行
radius_graph_conv
。 I have two naive implementations of graph conv below. 我在下面有两个简单的图转换的实现。
import numpy as np
from sklearn.neighbors import radius_neighbors_graph, KDTree
def radius_graph_conv(X, rad):
nn_rad_csr = radius_neighbors_graph(X, rad, include_self=True)
csr_indices = nn_rad_csr.indices
csr_indptr = nn_rad_csr.indptr
Y = np.copy(X)
for i in range(X.shape[0]):
j, k = csr_indptr[i], csr_indptr[i+1]
neighbor_idx = csr_indices[j:k]
rad_neighborhood = X[neighbor_idx] # ndim always 2
Y[i] = np.mean(rad_neighborhood, axis=0)
return Y
def radius_graph_conv_matmul(X, rad):
nn_rad_arr = radius_neighbors_graph(X, rad, include_self=True).toarray()
# np.sum(nn_rad_arr, axis=-1) is basically a count of neighbors
return np.matmul(nn_rad_arr / np.sum(nn_rad_arr, axis=-1), X)
Is there a better way to do this? 有一个更好的方法吗? With a knn graph, its a very simple function, since the number of neighbors is fixed and you can just index into X, but with a radius or density based nearest neighbors graph, you have to work with a CSR, (or an array of arrays if you are using a kd tree).
使用knn图,它的功能非常简单,因为邻居的数目是固定的,您可以索引到X,但是对于基于半径或密度的最近邻居图,则必须使用CSR(或数组)数组(如果您使用的是kd树)。
Here is the direct way of exploiting csr format. 这是利用csr格式的直接方法。 Your matmul solution probably does similar things under the hood.
您的matmul解决方案可能在后台执行类似的操作。 But we save one lookup (from the
.data
attribute) by also exploiting that it is an adjacency matrix; 但是我们还利用它是一个邻接矩阵来保存一个查找(来自
.data
属性)。 also, diff
ing .indptr
should be more efficient than summing the equivalent amount of ones. 同样,
diff
.indptr
应该比求和相等的数量更有效。
>>> import numpy as np
>>> from scipy import sparse
>>>
# create mock data
>>> A = np.random.random((100, 100)) < 0.1
>>> A = (A | A.T).view(np.uint8)
>>> AS = sparse.csr_matrix(A)
>>> X = np.random.random((100, 3))
>>>
# dense solution for reference
>>> Xa = A @ X / A.sum(axis=-1, keepdims=True)
# sparse solution
>>> XaS = np.add.reduceat(X[AS.indices], AS.indptr[:-1], axis=0) / np.diff(AS.indptr)[:, None]
>>>
# check they are the same
>>> np.allclose(Xa, XaS)
True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.