从Scipy CSR矩阵索引到numpy数组的最有效方法？

Question

I have a numpy ndarray X with shape (4000, 3) , where each sample in X is a 3D coordinate (x,y,z). 我有一个形状为(4000, 3) 4000，3）的numpy ndarray X ，其中X中的每个样本都是3D坐标（x，y，z）。

I have a scipy csr matrix nn_rad_csr of shape (4000, 4000) , which is the nearest neighbors graph generated from sklearn.neighbors.radius_neighbors_graph(X, 0.01, include_self=True) . 我有一个形状为(4000, 4000)的sscipy csr矩阵nn_rad_csr ，它是从sklearn.neighbors.radius_neighbors_graph(X, 0.01, include_self=True)生成的最近邻居图。

nn_rad_csr.toarray()[i] is a shape (4000,) sparse vector with binary weights (0 or 1) associated with the edges in the nearest neighbors graph from node X[i] . nn_rad_csr.toarray()[i]是形状（4000，）的稀疏矢量，其二进制权重（0或1）与节点X[i]的最近邻居图中的边关联。

For instance, if nn_rad_csr.toarray()[i][j] == 1 then X[j] is within the nearest neighbor radius of X[i] , whereas a value of 0 means it is not a neighbor. 例如，如果nn_rad_csr.toarray()[i][j] == 1则X[j]在X[i]的最近邻居半径内，而值0表示它不是邻居。

What I'd like to do is have a function radius_graph_conv(X, rad) which returns an array Y which is X , averaged by its neighbors' values. 我想做的是有一个函数radius_graph_conv(X, rad) ，该函数返回一个数组Y ，该数组Y为X ，按其邻居的值平均。 I'm not sure how to exploit the sparsity of a CSR matrix to efficiently perform radius_graph_conv . 我不确定如何利用CSR矩阵的稀疏性来有效地执行radius_graph_conv 。 I have two naive implementations of graph conv below. 我在下面有两个简单的图转换的实现。

import numpy as np
from sklearn.neighbors import radius_neighbors_graph, KDTree

def radius_graph_conv(X, rad):
    nn_rad_csr = radius_neighbors_graph(X, rad, include_self=True)
    csr_indices = nn_rad_csr.indices
    csr_indptr  = nn_rad_csr.indptr
    Y = np.copy(X)
    for i in range(X.shape[0]):
        j, k = csr_indptr[i], csr_indptr[i+1]
        neighbor_idx = csr_indices[j:k]
        rad_neighborhood = X[neighbor_idx] # ndim always 2
        Y[i] = np.mean(rad_neighborhood, axis=0)
    return Y

def radius_graph_conv_matmul(X, rad):
    nn_rad_arr = radius_neighbors_graph(X, rad, include_self=True).toarray()
    # np.sum(nn_rad_arr, axis=-1) is basically a count of neighbors

    return np.matmul(nn_rad_arr / np.sum(nn_rad_arr, axis=-1), X)

Is there a better way to do this? 有一个更好的方法吗？ With a knn graph, its a very simple function, since the number of neighbors is fixed and you can just index into X, but with a radius or density based nearest neighbors graph, you have to work with a CSR, (or an array of arrays if you are using a kd tree). 使用knn图，它的功能非常简单，因为邻居的数目是固定的，您可以索引到X，但是对于基于半径或密度的最近邻居图，则必须使用CSR（或数组）数组（如果您使用的是kd树）。

Answer 1

Here is the direct way of exploiting csr format. 这是利用csr格式的直接方法。 Your matmul solution probably does similar things under the hood. 您的matmul解决方案可能在后台执行类似的操作。 But we save one lookup (from the .data attribute) by also exploiting that it is an adjacency matrix; 但是我们还利用它是一个邻接矩阵来保存一个查找（来自.data属性）。 also, diff ing .indptr should be more efficient than summing the equivalent amount of ones. 同样， diff .indptr应该比求和相等的数量更有效。

>>> import numpy as np
>>> from scipy import sparse
>>> 
# create mock data
>>> A = np.random.random((100, 100)) < 0.1
>>> A = (A | A.T).view(np.uint8)
>>> AS = sparse.csr_matrix(A)
>>> X = np.random.random((100, 3))
>>> 
# dense solution for reference
>>> Xa = A @ X / A.sum(axis=-1, keepdims=True)
# sparse solution
>>> XaS = np.add.reduceat(X[AS.indices], AS.indptr[:-1], axis=0) / np.diff(AS.indptr)[:, None]
>>> 
# check they are the same
>>> np.allclose(Xa, XaS)
True

从Scipy CSR矩阵索引到numpy数组的最有效方法？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-01-22 22:12:14

从Scipy CSR矩阵索引到numpy数组的最有效方法？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-01-22 22:12:14

解决方案1
1 已采纳 2018-01-22 22:12:14