简体   繁体   English

没有列为零的稀疏矩阵行

[英]Rows of sparse matrix where no column is zero

I have a matrix like this:我有一个这样的矩阵:

A = sp.csr_matrix(np.array(
      [[1, 1, 2, 1],
       [0, 0, 2, 0],
       [1, 4, 1, 1],
       [0, 1, 0, 0]]))

I want to get all the rows where all columns are nonzero, so I can then get their sum.我想得到所有列都是非零的所有行,所以我可以得到它们的总和。 Either as an array:要么作为一个数组:

rows = [True, False, True, False]
result = A[rows].sum()

Or as indices:或作为索引:

rows = [0, 2]
result = A[rows].sum()

I am stuck however at the first part, figuring out which rows to include in the sum, as most results seem to be looking for the opposite (rows where all columns are zero).然而,我在第一部分卡住了,弄清楚哪些行要包含在总和中,因为大多数结果似乎都在寻找相反的结果(所有列都为零的行)。

It is a bit easier to do for numpy arrays than for sparse ones. numpy arrays 比稀疏的要容易一些。 If you do not mind converting to numpy as an intermediate step, you can get the right rows via如果您不介意转换为 numpy 作为中间步骤,您可以通过

(A.toarray() != 0).all(axis=1)

to produce生产

array([ True, False,  True, False])

and then use it in indexing A as such:然后在索引A中使用它:

A[(A.toarray() != 0).all(axis=1),:].sum()

returns 12返回12

In [35]: from scipy import sparse
In [36]: A = sparse.csr_matrix(np.array(
    ...:       [[1, 1, 2, 1],
    ...:        [0, 0, 2, 0],
    ...:        [1, 4, 1, 1],
    ...:        [0, 1, 0, 0]]))
In [37]: A
Out[37]: 
<4x4 sparse matrix of type '<class 'numpy.int64'>'
    with 10 stored elements in Compressed Sparse Row format>

Sparse doesn't do 'all/any' kinds of operations because they treat 0's as significant values. Sparse 不执行“所有/任何”类型的操作,因为它们将 0 视为重要值。

all on the dense equivalent works nicely:密集等效的all内容都很好:

In [41]: A.A.all(axis=1)
Out[41]: array([ True, False,  True, False])

On the sparse one we can turn the dtype to boolean, and sum along the axis.在稀疏模型上,我们可以将 dtype 转换为 boolean,然后沿轴求和。 And then test it for the full value:然后测试它的完整值:

In [42]: A.astype(bool).sum(axis=1)
Out[42]: 
matrix([[4],
        [1],
        [4],
        [1]])
In [43]: A.astype(bool).sum(axis=1).A1==4
Out[43]: array([ True, False,  True, False])

Notice that the sparse sum returns a np.matrix .请注意,稀疏sum返回np.matrix I used A1 to turn that into a 1d array.我使用A1将其转换为一维数组。

If the matrix isn't too large, working with the dense array may be faster.如果矩阵不是太大,使用密集数组可能会更快。 Sparse operations like sum are actually performed with matrix multiplication.sum这样的稀疏运算实际上是通过矩阵乘法执行的。

In [51]: A.astype(bool)@np.ones(4,int)
Out[51]: array([4, 1, 4, 1])

Or we could convert it to lil format, and look at the length of the 'rows':或者我们可以将其转换为lil格式,并查看“行”的长度:

In [67]: A.tolil().data
Out[67]: 
array([list([1, 1, 2, 1]), list([2]), list([1, 4, 1, 1]), list([1])],
      dtype=object)
In [68]: [len(i) for i in A.tolil().data]
Out[68]: [4, 1, 4, 1]

But wait, there's more.但是等等,还有更多。 The indptr attribute of the csr is: csrindptr属性为:

In [69]: A.indptr
Out[69]: array([ 0,  4,  5,  9, 10], dtype=int32)
In [70]: np.diff(A.indptr)
Out[70]: array([4, 1, 4, 1], dtype=int32)

I've omitted some test timings, but this last is clearly the fastest!我省略了一些测试时间,但这显然是最快的!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM