在 scipy 稀疏 csr_matrix 中查找 n 個隨機零元素

Question

我想在稀疏矩陣中找到 n 個零元素。 我寫下面的代碼：

counter = 0
while counter < n:
    r = randint(0, W.shape[0]-1)
    c = randint(0, W.shape[1]-1)
    if W[r,c] == 0:
        result.append([r,c])
        counter += 1

不幸的是，它非常緩慢。 我想要更有效的東西。 有什么方法可以快速訪問 scipy 稀疏矩陣中的零元素？

Answer 1

首先，這里有一些代碼來創建一些示例數據：

import numpy as np
rows, cols = 10,20   # Shape of W
nonzeros = 7         # How many nonzeros exist in W
zeros = 70           # How many zeros we want to randomly select

W = np.zeros((rows,cols), dtype=int)
nonzero_rows = np.random.randint(0, rows, size=(nonzeros,))
nonzero_cols = np.random.randint(0, cols, size=(nonzeros,))
W[nonzero_rows, nonzero_cols] = 20

上面的代碼將W創建為一個稀疏的 numpy 數組，其形狀為(10,20) ，並且只有7個非零元素（在200元素中）。 所有非零元素的值為20 。

這是從這個稀疏矩陣中選擇zeros=70零元素的解決方案：

argwhere_res = np.argwhere(np.logical_not(W))
zero_count = len(argwhere_res)
ids = np.random.choice(range(zero_count), size=(zeros,))
res = argwhere_res[ids]

res現在將是一個形狀(70,2)數組，給出我們從W隨機選擇的70元素的位置。

請注意，這不涉及任何循環。

Answer 2

首先列出所有 0 的列表：

list_0s = [(j, i) for i in range(len(matrix[j])) for j in range len(matrix) if matrix[j,i] == 0]

然后得到你的隨機選擇：

random_0s = random.choices(list_0s, k=n)

測試這個：

 matrix = np.random.randint(1000, size=(1000,1000))
 n = 100

需要 0.34 秒。

Answer 3

import numpy as np
import scipy.sparse as sparse
import random
randint = random.randint

def orig(W, n):
    result = list()
    while len(result) < n:
        r = randint(0, W.shape[0]-1)
        c = randint(0, W.shape[1]-1)
        if W[r,c] == 0:
            result.append((r,c))
    return result

def alt(W, n):
    nrows, ncols = W.shape
    density = n / (nrows*ncols - W.count_nonzero())
    W = W.copy()
    W.data[:] = 1
    W2 = sparse.csr_matrix((nrows, ncols))
    while W2.count_nonzero() < n:
        W2 += sparse.random(nrows, ncols, density=density, format='csr')
        # remove nonzero values from W2 where W is 1
        W2 -= W2.multiply(W)
    W2 = W2.tocoo()    
    r = W2.row[:n]
    c = W2.col[:n]
    result = list(zip(r, c))
    return result

def alt_with_dupes(W, n):
    nrows, ncols = W.shape
    density = n / (nrows*ncols - W.count_nonzero())
    W = W.copy()
    W.data[:] = 1
    W2 = sparse.csr_matrix((nrows, ncols))
    while W2.data.sum() < n:
        tmp = sparse.random(nrows, ncols, density=density, format='csr')
        tmp.data[:] = 1
        W2 += tmp
        # remove nonzero values from W2 where W is 1
        W2 -= W2.multiply(W)
    W2 = W2.tocoo()
    num_repeats = W2.data.astype('int')
    r = np.repeat(W2.row, num_repeats)
    c = np.repeat(W2.col, num_repeats)
    idx = np.random.choice(len(r), n)
    result = list(zip(r[idx], c[idx]))
    return result

這是一個基准：

W = sparse.random(1000, 50000, density=0.02, format='csr')
n = int((np.multiply(*W.shape) - W.nnz)*0.01)

In [194]: %timeit alt(W, n)
809 ms ± 261 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [195]: %timeit orig(W, n)
11.2 s ± 121 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [223]: %timeit alt_with_dupes(W, n)
986 ms ± 290 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

請注意， alt返回一個沒有重復的列表。 orig和alt_with_dupes可能返回重復項。

在 scipy 稀疏 csr_matrix 中查找 n 個隨機零元素

問題描述

3 個解決方案

解決方案1
1 2019-03-16 16:39:31

解決方案2
0 2019-03-16 16:25:26

解決方案3
0 已采納 2019-03-16 19:20:07

在 scipy 稀疏 csr_matrix 中查找 n 個隨機零元素

問題描述

3 個解決方案

解決方案1 1 2019-03-16 16:39:31

解決方案2 0 2019-03-16 16:25:26

解決方案3 0 已采納 2019-03-16 19:20:07

解決方案1
1 2019-03-16 16:39:31

解決方案2
0 2019-03-16 16:25:26

解決方案3
0 已采納 2019-03-16 19:20:07