如何在for循環中獲取lil_matrix元素的索引？

Question

我使用scipy.sparse.lil_matrix創建了一個稀疏矩陣：

import scipy.sparse as sp
test = sp.lil_matrix((3,3))
test[0,0]=1

我可以通過執行以下操作循環並打印非零元素：

for el in test:
    print(el)

打印出(0, 0) 1.0 。 如何在不打印的情況下訪問這兩條信息？ 換句話說，返回索引和值的lil_matrix元素的適當方法是什么？ 做el.data返回array([list([])], dtype=object) 。

請注意，我使用lil_matrix是因為我需要在一個非常大的雙 for 循環中為其分配非零值。

Answer 1

您尋求的顯示很像coo稀疏矩陣的str顯示。

In [216]: M = (sparse.random(5,5,.2)*10).astype(int)
In [217]: M
Out[217]: 
<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in COOrdinate format>
In [218]: print(M)   # str(M)
  (0, 0)    0
  (0, 2)    8
  (1, 3)    8
  (1, 4)    8
  (4, 4)    4

稀疏矩陣有一種nonzero方法來顯示非零元素的坐標。

In [219]: M.nonzero()
Out[219]: (array([0, 1, 1, 4], dtype=int32), array([2, 3, 4, 4], dtype=int32))

對於coo ，值存儲為 3 個數組：

In [220]: M.data, M.row, M.col
Out[220]: 
(array([0, 8, 8, 8, 4]),
 array([0, 0, 1, 1, 4], dtype=int32),
 array([0, 2, 3, 4, 4], dtype=int32))

coo格式中這些元素的順序沒有限制。 甚至可能有重復，但在轉換為顯示或csr格式時會匯總這些。

當我們將其轉換為lil格式時，數據現在存儲在 2 個列表數組中，每行一個列表：

In [221]: Ml = M.tolil()
In [222]: Ml.data
Out[222]: 
array([list([0, 8]), list([8, 8]), list([]), list([]), list([4])],
      dtype=object)
In [223]: Ml.rows
Out[223]: 
array([list([0, 2]), list([3, 4]), list([]), list([]), list([4])],
      dtype=object)

它也有nonzero ，但看看代碼（它使用coo格式）：

In [224]: Ml.nonzero()
Out[224]: (array([0, 1, 1, 4], dtype=int32), array([2, 3, 4, 4], dtype=int32))
In [225]: Ml.nonzero??
Signature: Ml.nonzero()
Source:   
    def nonzero(self):
         ...
        # convert to COOrdinate format
        A = self.tocoo()
        nz_mask = A.data != 0
        return (A.row[nz_mask], A.col[nz_mask])
File:      /usr/local/lib/python3.6/dist-packages/scipy/sparse/base.py
Type:      method

實際上，這是所有稀疏格式的通用nonzero值。 nz_mask部分允許矩陣可能有 0 個尚未清理的值。

雖然lil是為輕松地逐個元素更新而設計的，但我們通常建議盡可能從輸入數組的coo樣式創建矩陣。 通常可以更有效地創建這些數組。 甚至列表追加或擴展也可以更快。

更多地查看Ml矩陣上的迭代 - 它為每一行創建一個lil ：

In [230]: [x for x in Ml]
Out[230]: 
[<1x5 sparse matrix of type '<class 'numpy.int64'>'
    with 2 stored elements in List of Lists format>,
 <1x5 sparse matrix of type '<class 'numpy.int64'>'
    with 2 stored elements in List of Lists format>,
 <1x5 sparse matrix of type '<class 'numpy.int64'>'
    with 0 stored elements in List of Lists format>,
 <1x5 sparse matrix of type '<class 'numpy.int64'>'
    with 0 stored elements in List of Lists format>,
 <1x5 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in List of Lists format>]

我們可以顯示每一行的數據：

In [231]: [((i,x.rows[0]),x.data[0]) for i,x in enumerate(Ml)]
Out[231]: 
[((0, [0, 2]), [0, 8]),
 ((1, [3, 4]), [8, 8]),
 ((2, []), []),
 ((3, []), []),
 ((4, [4]), [4])]

或過濾掉空行：

In [232]: [((i,x.rows[0]),x.data[0]) for i,x in enumerate(Ml) if x.data[0]]
Out[232]: [((0, [0, 2]), [0, 8]), ((1, [3, 4]), [8, 8]), ((4, [4]), [4])]

我們需要另一個迭代來分離每行中的元素。

在使用稀疏數組與密集數組時，一個經驗法則是稀疏度（非零元素的百分比）應該小於 10%，才值得使用稀疏格式。 但這在很大程度上取決於您的使用和關注。

從簡單的數據存儲角度來看，請注意coo格式必須為每個非零項使用 3 個數字，而對於密集數組僅使用 1 個數字。 稀疏矩陣乘法對於csr格式來說是比較好的。 可以只關注data值（例如sin ）的其他計算也相對有效。 但是如果數學必須比較 2 個矩陣的稀疏性，例如加法和元素乘法，稀疏性會更糟。

索引、切片和求和實際上可能使用矩陣乘法。 coo格式沒有實現這些。 lil可以很好地完成一些面向行的操作。 創建稀疏矩陣的基本操作需要時間。

Answer 2

一切都在.data和.rows

from scipy import sparse
arr = sparse.random(10,5,format='lil', density=0.5)

對於這個包含 25 個元素的 10x5 數組：

>>> arr
<10x5 sparse matrix of type '<class 'numpy.float64'>'
    with 25 stored elements in List of Lists format>

>>> arr.data.shape
(10,)

>>> arr.data
array([list([0.7656088763162588, 0.7262695483137545]),
       list([0.5229054168281109, 0.6329489698531673, 0.9090750679268123]),
       list([0.3285250285217297, 0.12678874412598085, 0.49074613569184733]),
       list([0.9376762935882884]), list([0.7783159122917774]),
       list([0.8750078624527947, 0.017065437987856757, 0.7161352157970525]),
       list([0.6849637433019786, 0.05732598765212671, 0.09948536587262824]),
       list([0.5683250727980487, 0.960851197599538, 0.7540173942047833]),
       list([0.5891879469424754, 0.7901005027272154, 0.5829700379167293]),
       list([0.6266097436787399, 0.8843420498719459, 0.9040791506861361])],
      dtype=object)

.data數組的每個元素都是一個列表，其中包含該行的值。

>>> arr.rows
array([list([0, 4]), list([0, 1, 4]), list([1, 3, 4]), list([1]),
       list([3]), list([0, 1, 2]), list([0, 1, 4]), list([1, 2, 3]),
       list([0, 2, 4]), list([0, 1, 3])], dtype=object)

.rows數組的每個元素都是.data中每個非零值的列索引列表。

請注意，我使用 lil_matrix 是因為我需要在一個非常大的雙 for 循環中為其分配非零值。

這幾乎肯定不是一個好主意。 lil_matrix的開銷意味着，如果它不低於 5% 稀疏，則幾乎可以肯定填充密集數組會更好。 即便如此，它也很不確定。 這是一種非常糟糕的數據存儲格式。

編輯：

>>>> for r in arr:
>>>>     print(r.data)

[list([0.7656088763162588, 0.7262695483137545])]
[list([0.5229054168281109, 0.6329489698531673, 0.9090750679268123])]
[list([0.3285250285217297, 0.12678874412598085, 0.49074613569184733])]
[list([0.9376762935882884])]
[list([0.7783159122917774])]
[list([0.8750078624527947, 0.017065437987856757, 0.7161352157970525])]
[list([0.6849637433019786, 0.05732598765212671, 0.09948536587262824])]
[list([0.5683250727980487, 0.960851197599538, 0.7540173942047833])]
[list([0.5891879469424754, 0.7901005027272154, 0.5829700379167293])]
[list([0.6266097436787399, 0.8843420498719459, 0.9040791506861361])]

編輯2：

我不知道你的實際功能或目標是什么，但如果你知道你有多少非零項，你可以預先分配你需要的數組並跳過整個小事情。

import numpy as np

N = 10000
data, rows, cols = np.zeros(N), np.zeros(N), np.zeros(N)

for i, r in enumerate(_):
    for j, c in enumerate(_):
        _idx = i * len(cols) + j
        data[_idx] = some_data_function()
        rows[_idx] = r
        cols[_idx] = c

arr = sparse.csr_matrix((data, (rows, cols)))

如何在for循環中獲取lil_matrix元素的索引？

問題描述

2 個解決方案

解決方案1
2 已采納 2020-10-13 18:38:24

解決方案2
1 2020-10-13 18:23:00

如何在for循環中獲取lil_matrix元素的索引？

問題描述

2 個解決方案

解決方案1 2 已采納 2020-10-13 18:38:24

解決方案2 1 2020-10-13 18:23:00

解決方案1
2 已采納 2020-10-13 18:38:24

解決方案2
1 2020-10-13 18:23:00