簡體   English   中英

從稀疏矩陣的行創建稀疏對角矩陣

[英]Create a sparse diagonal matrix from row of a sparse matrix

我在Python / Scipy中處理相當大的矩陣。 我需要從大矩陣(加載到coo_matrix)中提取行並將它們用作對角元素。 目前我以下列方式做到這一點:

import numpy as np
from scipy import sparse

def computation(A):
  for i in range(A.shape[0]):
    diag_elems = np.array(A[i,:].todense())
    ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
    #...

#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csc")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')

我從profile輸出中看到的是,在提取diag_elemsget_csr_submatrix函數會消耗大部分時間。 這讓我覺得我使用了初始數據的低效稀疏表示或從稀疏矩陣中提取行的錯誤方法。 您能否提出一種更好的方法從稀疏矩陣中提取行並以對角線形式表示?

編輯

以下變體消除了行提取的瓶頸(注意,簡單地將'csc'更改為csr是不夠的, A[i,:]必須替換為A.getrow(i) )。 然而,主要問題是如何省略實現( .todense() )並從行的稀疏表示創建對角矩陣。

import numpy as np
from scipy import sparse

def computation(A):
  for i in range(A.shape[0]):
    diag_elems = np.array(A.getrow(i).todense())
    ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
    #...

#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csr")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')

如果我直接從1行CSR矩陣創建DIAgonal矩陣,如下所示:

diag_elems = A.getrow(i)
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1])

那么我既不能指定format="csc"參數,也不能將ith_diags轉換為CSC格式:

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/profile.py", line 70, in run
    prof = prof.run(statement)
  File "/usr/local/lib/python2.6/profile.py", line 456, in run
    return self.runctx(cmd, dict, dict)
  File "/usr/local/lib/python2.6/profile.py", line 462, in runctx
    exec cmd in globals, locals
  File "<string>", line 1, in <module>
  File "<stdin>", line 4, in computation
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/construct.py", line 56, in spdiags
    return dia_matrix((data, diags), shape=(m,n)).asformat(format)
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/base.py", line 211, in asformat
    return getattr(self,'to' + format)()
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/dia.py", line 173, in tocsc
    return self.tocoo().tocsc()
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/coo.py", line 263, in tocsc
    data    = np.empty(self.nnz, dtype=upcast(self.dtype))
  File "/usr/local/lib/python2.6/site-packages/scipy/sparse/sputils.py", line 47, in upcast
    raise TypeError,'no supported conversion for types: %s' % args
TypeError: no supported conversion for types: object`

這是我想出的:

def computation(A):
    for i in range(A.shape[0]):
        idx_begin = A.indptr[i]
        idx_end = A.indptr[i+1]
        row_nnz = idx_end - idx_begin
        diag_elems = A.data[idx_begin:idx_end]
        diag_indices = A.indices[idx_begin:idx_end]
        ith_diag = sparse.csc_matrix((diag_elems, (diag_indices, diag_indices)),shape=(A.shape[1], A.shape[1]))
        ith_diag.eliminate_zeros()

Python分析器表示1.464秒,之前為5.574秒。 它利用了定義稀疏矩陣的底層密集數組(indptr,indices,data)。 這是我的速成課程:A.indptr [i]:A.indptr [i + 1]定義密集數組中哪些元素對應於第i行中的非零值。 A.data是一個非零的密集1d數組,A和A.indptr的值是這些值的列。

我會做更多的測試,以確保這與以前一樣。 我只查了幾個案子。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM