為什么 scipy.sparse.csr_matrix 廣播乘法而不廣播減法？

Question

我想在這里了解這個問題的解決方案，雖然我可以重用代碼，但我更願意在我這樣做之前知道發生了什么。

問題是關於如何平鋪scipy.sparse.csr_matrix object，在撰寫本文時，最佳答案（來自@user3357359）顯示了如何將矩陣的單行平鋪在多行中：

from scipy.sparse import csr_matrix
sparse_row = csr_matrix([[0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]])
repeat_number = 3
repeated_row_matrix = csr_matrix(np.ones([repeat_number,1])) * sparse_row

（我添加了sparse_row和repeat_number初始化以幫助使事情具體化）。

如果我現在將其轉換為密集矩陣並按如下方式打印：

print(f"repeated_row_matrix.todense() = {repeated_row_matrix.todense()}")

這給出了 output：

repeated_row_matrix.todense() =
[[0 0 0 0 0 1 0 1 1 0 0 0]
 [0 0 0 0 0 1 0 1 1 0 0 0]
 [0 0 0 0 0 1 0 1 1 0 0 0]]

repeated_row_matrix賦值右邊的操作在我看來是在進行廣播。 原始sparse_row的形狀為(1,12) ，臨時矩陣是一個(3,1)矩陣，結果是(3,12)矩陣。 到目前為止，這與您對numpy.array的預期行為類似。 但是，如果我用減法運算符嘗試同樣的事情：

sparse_row = csr_matrix([[0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]])
repeat_number = 3
repeated_row_matrix = csr_matrix(np.ones([repeat_number,1])) - sparse_row
print(f"repeated_row_matrix.todense() =\n{repeated_row_matrix.todense()}")

我在第三行收到錯誤：

3 repeated_row_matrix = csr_matrix(np.ones([repeat_number,1])) - sparse_row
...
ValueError: inconsistent shapes

這是有意的行為嗎？ 如果是這樣，為什么？

我猜想分別具有 n1 和 n2 非零值的兩個稀疏 K 向量之間的乘法總是小於或等於 min(n1,n2) 非零值。 減法在最壞的情況下會有 n1+n2 非零，但這是否真的解釋了為什么允許一種行為而不允許另一種行為。

我希望從矩陣中減去單個行向量（對於我正在玩的 K-medoids 的稀疏實現）。 為了執行減法，我創建了一個臨時稀疏數組，它通過使用帶乘法的廣播來平鋪原始行，然后我可以從一個數組中減去另一個數組。 我確信應該有更好的方法，但我沒有看到。

另外，@“CJ Jackson”在評論中回復說構建平鋪的更好方法是：

sparse_row[np.zeros(repeat_number),:]

這行得通，但我不知道為什么或正在使用什么功能。 有人可以指出我的文檔嗎？ 如果sparse_row是numpy.array那么這不會導致平鋪。

提前致謝。

Answer 1

對於密集的 arrays，廣播乘法和矩陣乘法在特殊情況下可以做同樣的事情。 例如 2 1d arrays

In [3]: x = np.arange(3); y = np.arange(5)

廣播：

In [4]: x[:,None]*y   # (3,1)*(5,) => (3,1)*(1,5) => (3,5)
Out[4]: 
array([[0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4],
       [0, 2, 4, 6, 8]])

(3,1) 和 (1,5) 的點/矩陣乘法。 這不是廣播。 它在共享大小 1 維度上進行乘積求和：

In [5]: x[:,None]@y[None,:]
Out[5]: 
array([[0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4],
       [0, 2, 4, 6, 8]])

為這些制作稀疏矩陣：

In [6]: Mx = sparse.csr_matrix(x);My = sparse.csr_matrix(y)    
In [11]: Mx
Out[11]: 
<1x3 sparse matrix of type '<class 'numpy.intc'>'
    with 2 stored elements in Compressed Sparse Row format>    
In [12]: My
Out[12]: 
<1x5 sparse matrix of type '<class 'numpy.intc'>'
    with 4 stored elements in Compressed Sparse Row format>

注意形狀 (1,3) 和 (1,5)。 要進行矩陣乘法，首先需要將其轉置為 (3,1)：

In [13]: Mx.T@My
Out[13]: 
<3x5 sparse matrix of type '<class 'numpy.intc'>'
    with 8 stored elements in Compressed Sparse Column format>

In [14]: _.A
Out[14]: 
array([[0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4],
       [0, 2, 4, 6, 8]], dtype=int32)

Mx.T*My的工作方式相同，因為sparse是在np.matrix （和 MATLAB）上建模的，其中*是矩陣乘法。

逐元素乘法與密集乘法的工作方式相同：

In [20]: Mx.T.multiply(My)
Out[20]: 
<3x5 sparse matrix of type '<class 'numpy.intc'>'
    with 8 stored elements in Compressed Sparse Column format>

我有點驚訝，它看起來確實有點像broadcasting ，盡管它不涉及任何自動None維度（稀疏總是 2d）。 有趣的是，我找不到密集矩陣的逐元素乘法。

但是正如您發現的那樣， Mx.T-My引發了inconsistent shapes的錯誤。 稀疏的開發人員選擇不實施這種減法（或加法）。 一般來說，稀疏矩陣的加法或減法是一個問題。 如果您向所有元素添加一些內容，包括“隱含的”0，它很容易導致密集矩陣。

In [41]: Mx+1
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Input In [41], in <cell line: 1>()
----> 1 Mx+1

File ~\anaconda3\lib\site-packages\scipy\sparse\base.py:410, in spmatrix.__add__(self, other)
    408         return self.copy()
    409     # Now we would add this scalar to every element.
--> 410     raise NotImplementedError('adding a nonzero scalar to a '
    411                               'sparse matrix is not supported')
    412 elif isspmatrix(other):
    413     if other.shape != self.shape:

NotImplementedError: adding a nonzero scalar to a sparse matrix is not supported

要復制廣播的減法：

In [54]: x[:,None]-y
Out[54]: 
array([[ 0, -1, -2, -3, -4],
       [ 1,  0, -1, -2, -3],
       [ 2,  1,  0, -1, -2]])

我們必須“平鋪”矩陣。 您的鏈接顯示了一些選項（包括我的回答）。 另一種選擇是vstack矩陣的多個實例。 sparse.vstack實際上使用coo矩陣格式創建了一個新矩陣：

In [55]: Mxx = sparse.vstack([Mx]*5);Myy = sparse.vstack([My,My,My])    
In [56]: Mxx,Myy
Out[56]: 
(<5x3 sparse matrix of type '<class 'numpy.intc'>'
    with 10 stored elements in Compressed Sparse Row format>,
 <3x5 sparse matrix of type '<class 'numpy.intc'>'
    with 12 stored elements in Compressed Sparse Row format>)

現在可以添加或減去兩個 (3,5) 矩陣：

In [57]: Mxx.T-Myy
Out[57]: 
<3x5 sparse matrix of type '<class 'numpy.intc'>'
    with 12 stored elements in Compressed Sparse Column format>

In [58]: _.A
Out[58]: 
array([[ 0, -1, -2, -3, -4],
       [ 1,  0, -1, -2, -3],
       [ 2,  1,  0, -1, -2]], dtype=int32)

在發展稀疏數學的線性代數世界（尤其是有限差分和有限元）中，矩陣乘法很重要。 僅對非零元素進行運算的其他數學運算相當簡單。 但是改變稀疏性的操作（相對）是昂貴的。 新值和子矩陣最好添加到coo輸入中。 轉換為csr時會添加coo重復項。 因此，整個矩陣的加法/減法（有意）受到限制。

為什么 scipy.sparse.csr_matrix 廣播乘法而不廣播減法？

問題描述

1 個解決方案

解決方案1
0 2022-12-01 05:37:09

為什么 scipy.sparse.csr_matrix 廣播乘法而不廣播減法？

問題描述

1 個解決方案

解決方案1 0 2022-12-01 05:37:09

解決方案1
0 2022-12-01 05:37:09