[英]Find n random zero element in a scipy sparse csr_matrix
I want to find n zero elements in a sparse matrix.我想在稀疏矩阵中找到 n 个零元素。 I write the code below:
我写下面的代码:
counter = 0
while counter < n:
r = randint(0, W.shape[0]-1)
c = randint(0, W.shape[1]-1)
if W[r,c] == 0:
result.append([r,c])
counter += 1
Unfortunately, it is very slow.不幸的是,它非常缓慢。 I want something more efficient.
我想要更有效的东西。 Is there any way to access zero elements from scipy sparse matrix quickly?
有什么方法可以快速访问 scipy 稀疏矩阵中的零元素?
First, here's some code to create some sample data:首先,这里有一些代码来创建一些示例数据:
import numpy as np
rows, cols = 10,20 # Shape of W
nonzeros = 7 # How many nonzeros exist in W
zeros = 70 # How many zeros we want to randomly select
W = np.zeros((rows,cols), dtype=int)
nonzero_rows = np.random.randint(0, rows, size=(nonzeros,))
nonzero_cols = np.random.randint(0, cols, size=(nonzeros,))
W[nonzero_rows, nonzero_cols] = 20
The above code has created W
as a sparse numpy array, having shape (10,20)
, and having only 7
non-zero elements (out of the 200
elements).上面的代码将
W
创建为一个稀疏的 numpy 数组,其形状为(10,20)
,并且只有7
个非零元素(在200
元素中)。 All the non-zero elements have a value 20
.所有非零元素的值为
20
。
Here's the solution to pick zeros=70
zero elements from this sparse matrix:这是从这个稀疏矩阵中选择
zeros=70
零元素的解决方案:
argwhere_res = np.argwhere(np.logical_not(W))
zero_count = len(argwhere_res)
ids = np.random.choice(range(zero_count), size=(zeros,))
res = argwhere_res[ids]
res
would now be a shape (70,2)
array giving the locations of the 70
elements that we have randomly chosen from W
. res
现在将是一个形状(70,2)
数组,给出我们从W
随机选择的70
元素的位置。
Note that this does not involve any loops.请注意,这不涉及任何循环。
First make a list of all the 0's:首先列出所有 0 的列表:
list_0s = [(j, i) for i in range(len(matrix[j])) for j in range len(matrix) if matrix[j,i] == 0]
Then get your random choices:然后得到你的随机选择:
random_0s = random.choices(list_0s, k=n)
Testing this with:测试这个:
matrix = np.random.randint(1000, size=(1000,1000))
n = 100
Takes 0.34 seconds.需要 0.34 秒。
import numpy as np
import scipy.sparse as sparse
import random
randint = random.randint
def orig(W, n):
result = list()
while len(result) < n:
r = randint(0, W.shape[0]-1)
c = randint(0, W.shape[1]-1)
if W[r,c] == 0:
result.append((r,c))
return result
def alt(W, n):
nrows, ncols = W.shape
density = n / (nrows*ncols - W.count_nonzero())
W = W.copy()
W.data[:] = 1
W2 = sparse.csr_matrix((nrows, ncols))
while W2.count_nonzero() < n:
W2 += sparse.random(nrows, ncols, density=density, format='csr')
# remove nonzero values from W2 where W is 1
W2 -= W2.multiply(W)
W2 = W2.tocoo()
r = W2.row[:n]
c = W2.col[:n]
result = list(zip(r, c))
return result
def alt_with_dupes(W, n):
nrows, ncols = W.shape
density = n / (nrows*ncols - W.count_nonzero())
W = W.copy()
W.data[:] = 1
W2 = sparse.csr_matrix((nrows, ncols))
while W2.data.sum() < n:
tmp = sparse.random(nrows, ncols, density=density, format='csr')
tmp.data[:] = 1
W2 += tmp
# remove nonzero values from W2 where W is 1
W2 -= W2.multiply(W)
W2 = W2.tocoo()
num_repeats = W2.data.astype('int')
r = np.repeat(W2.row, num_repeats)
c = np.repeat(W2.col, num_repeats)
idx = np.random.choice(len(r), n)
result = list(zip(r[idx], c[idx]))
return result
Here's a benchmark with:这是一个基准:
W = sparse.random(1000, 50000, density=0.02, format='csr')
n = int((np.multiply(*W.shape) - W.nnz)*0.01)
In [194]: %timeit alt(W, n)
809 ms ± 261 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [195]: %timeit orig(W, n)
11.2 s ± 121 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [223]: %timeit alt_with_dupes(W, n)
986 ms ± 290 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Note that alt
returns a list with no duplicates.请注意,
alt
返回一个没有重复的列表。 Both orig
and alt_with_dupes
may return duplicates. orig
和alt_with_dupes
可能返回重复项。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.