简体   繁体   English

从Scipy CSR矩阵索引到numpy数组的最有效方法?

[英]Most efficient way to index into a numpy array from a scipy CSR matrix?

I have a numpy ndarray X with shape (4000, 3) , where each sample in X is a 3D coordinate (x,y,z). 我有一个形状为(4000, 3) 4000,3)的numpy ndarray X ,其中X中的每个样本都是3D坐标(x,y,z)。

I have a scipy csr matrix nn_rad_csr of shape (4000, 4000) , which is the nearest neighbors graph generated from sklearn.neighbors.radius_neighbors_graph(X, 0.01, include_self=True) . 我有一个形状为(4000, 4000)的sscipy csr矩阵nn_rad_csr ,它是从sklearn.neighbors.radius_neighbors_graph(X, 0.01, include_self=True)生成的最近邻居图。

nn_rad_csr.toarray()[i] is a shape (4000,) sparse vector with binary weights (0 or 1) associated with the edges in the nearest neighbors graph from node X[i] . nn_rad_csr.toarray()[i]是形状(4000,)的稀疏矢量,其二进制权重(0或1)与节点X[i]的最近邻居图中的边关联。

For instance, if nn_rad_csr.toarray()[i][j] == 1 then X[j] is within the nearest neighbor radius of X[i] , whereas a value of 0 means it is not a neighbor. 例如,如果nn_rad_csr.toarray()[i][j] == 1X[j]X[i]的最近邻居半径内,而值0表示它不是邻居。

What I'd like to do is have a function radius_graph_conv(X, rad) which returns an array Y which is X , averaged by its neighbors' values. 我想做的是有一个函数radius_graph_conv(X, rad) ,该函数返回一个数组Y ,该数组YX ,按其邻居的值平均。 I'm not sure how to exploit the sparsity of a CSR matrix to efficiently perform radius_graph_conv . 我不确定如何利用CSR矩阵的稀疏性来有效地执行radius_graph_conv I have two naive implementations of graph conv below. 我在下面有两个简单的图转换的实现。

import numpy as np
from sklearn.neighbors import radius_neighbors_graph, KDTree

def radius_graph_conv(X, rad):
    nn_rad_csr = radius_neighbors_graph(X, rad, include_self=True)
    csr_indices = nn_rad_csr.indices
    csr_indptr  = nn_rad_csr.indptr
    Y = np.copy(X)
    for i in range(X.shape[0]):
        j, k = csr_indptr[i], csr_indptr[i+1]
        neighbor_idx = csr_indices[j:k]
        rad_neighborhood = X[neighbor_idx] # ndim always 2
        Y[i] = np.mean(rad_neighborhood, axis=0)
    return Y

def radius_graph_conv_matmul(X, rad):
    nn_rad_arr = radius_neighbors_graph(X, rad, include_self=True).toarray()
    # np.sum(nn_rad_arr, axis=-1) is basically a count of neighbors

    return np.matmul(nn_rad_arr / np.sum(nn_rad_arr, axis=-1), X)

Is there a better way to do this? 有一个更好的方法吗? With a knn graph, its a very simple function, since the number of neighbors is fixed and you can just index into X, but with a radius or density based nearest neighbors graph, you have to work with a CSR, (or an array of arrays if you are using a kd tree). 使用knn图,它的功能非常简单,因为邻居的数目是固定的,您可以索引到X,但是对于基于半径或密度的最近邻居图,则必须使用CSR(或数组)数组(如果您使用的是kd树)。

Here is the direct way of exploiting csr format. 这是利用csr格式的直接方法。 Your matmul solution probably does similar things under the hood. 您的matmul解决方案可能在后台执行类似的操作。 But we save one lookup (from the .data attribute) by also exploiting that it is an adjacency matrix; 但是我们还利用它是一个邻接矩阵来保存一个查找(来自.data属性)。 also, diff ing .indptr should be more efficient than summing the equivalent amount of ones. 同样, diff .indptr应该比求和相等的数量更有效。

>>> import numpy as np
>>> from scipy import sparse
>>> 
# create mock data
>>> A = np.random.random((100, 100)) < 0.1
>>> A = (A | A.T).view(np.uint8)
>>> AS = sparse.csr_matrix(A)
>>> X = np.random.random((100, 3))
>>> 
# dense solution for reference
>>> Xa = A @ X / A.sum(axis=-1, keepdims=True)
# sparse solution
>>> XaS = np.add.reduceat(X[AS.indices], AS.indptr[:-1], axis=0) / np.diff(AS.indptr)[:, None]
>>> 
# check they are the same
>>> np.allclose(Xa, XaS)
True

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 scipy csr 稀疏矩阵到 select 行子集的最有效方法 - Most efficient way to select subset of rows from scipy csr sparse matrix 尝试从大型NumPy数组构造Scipy csr_matrix时出现“ Killed:9”错误 - “Killed: 9” error when trying to construct a Scipy csr_matrix from a large NumPy array 计算 numpy 数组和 csr_matrix 之间成对最小值的最有效方法 - Most effective way to compute the pairwise minimum between a numpy array and a csr_matrix 从一维numpy数组中获取这种矩阵的最有效方法是什么? - What is the most efficient way to get this kind of matrix from a 1D numpy array? 如何以最有效的方式从numpy 1D数组创建对称矩阵 - How to create a symmetric matrix from a numpy 1D array the most efficient way 从另一个二维索引数组重新排列二维 numpy 数组的最有效方法 - Most efficient way to rearrange 2D numpy array from another 2D index array 从值列表转换为 scipy 稀疏矩阵的最有效方法是什么? - What is the most efficient way to convert from a list of values to a scipy sparse matrix? 将稀疏 scipy 矩阵的行设置为零的最有效方法是什么? - What is most efficient way of setting row to zeros for a sparse scipy matrix? 从 3D 矩阵堆栈在 numpy/scipy 中构造 3D 块对角矩阵堆栈的有效方法 - Efficient way of constructing a 3D stack of block diagonal matrix in numpy/scipy from a 3D stack of matrices 反转 numpy 阵列的最有效方法 - Most efficient way to reverse a numpy array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM