Python占用太多內存

Question

我正在從 .npz 文件導入稀疏矩陣。 下面是代碼的腳本。 稀疏矩陣 (Dx, Dy, ..., M) 的大小為 373248x373248，存儲了 746496 個元素。

if runmode == 2:
        data = np.load('Operators2.npz', allow_pickle=True)

    Dx = data['Dx']
    Dy = data['Dy']
    Dz = data['Dz']
    Dxx = data['Dxx']
    Dyy = data['Dyy']
    Dzz = data['Dzz']
    Dxp = data['Dxp']
    Dyp = data['Dyp']
    Dzp = data['Dzp']
    M = data['M']
    del data

如果我打印一個變量，例如 Dx，我會得到以下輸出：

array(<373248x373248 sparse matrix of type '<class 'numpy.float64'>'
    with 746496 stored elements in Compressed Sparse Column format>,
      dtype=object)

但是我的系統內存上升，程序崩潰了。 當我執行以下代碼行時程序崩潰。 我沒有收到任何錯誤，但程序崩潰了。

DIV = Dx*u+Dy*v+Dz*w

即使我執行以下代碼行，內存消耗也會增加並且程序崩潰

DIV = data['Dx']*u+data['Dy']*v+data['Dz']*w

這里 u,v,w 有 373248x1 的形狀。 DIV 的形狀是 373248x1。 由於 Dx、Dy、Dz 是稀疏矩陣，因此 Dx*u 進行矩陣向量乘法並給出向量。

如果在同一個代碼中，我實際上計算了 Dx, Dy,...,M 則內存沒有問題。 如果我正在計算 Dx，則輸出如下：

<373248x373248 sparse matrix of type '<class 'numpy.float64'>'
    with 746496 stored elements in Compressed Sparse Column format>

所以我認為在導入時創建對象的問題。 有沒有辦法避免這種情況？ 或者，我在導入稀疏矩陣時做錯了什么？ 謝謝你。

Answer 1

制作一個稀疏矩陣：

In [38]: M = sparse.random(1000,1000,.2,'csr')

保存它 3 種不同的方式：

In [39]: from scipy import io                                                                  
In [40]: np.savez('Msparse.npz', M=M)                                                          
In [41]: sparse.save_npz('M1sparse',M)                                                         

In [43]: io.savemat('Msparse.mat', {'M':M})

文件大小：

In [47]: ll M1spa* Mspar*                                                                      
-rw-rw-r-- 1 paul 1773523 Feb  1 12:40 M1sparse.npz
-rw-rw-r-- 1 paul 2404208 Feb  1 12:41 Msparse.mat
-rw-rw-r-- 1 paul 2404801 Feb  1 12:39 Msparse.npz

加載 3 個矩陣：

In [48]: M1=sparse.load_npz('M1sparse.npz')                                                    
In [49]: M2=np.load('Msparse.npz',allow_pickle=True)['M']                                      
In [50]: M3=io.loadmat('Msparse.mat')['M']                                                     
In [51]: M1                                                                                    
Out[51]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 200000 stored elements in Compressed Sparse Row format>
In [52]: M2                                                                                    
Out[52]: 
array(<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 200000 stored elements in Compressed Sparse Row format>,
      dtype=object)
In [53]: M3                                                                                    
Out[53]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 200000 stored elements in Compressed Sparse Column format>

M1和M3是相同的 - csr就像M用於save_npz ， csc （MATLAB 格式）用於.mat 。

M2有一個對象數據類型包裝器。

In [54]: (M1*np.ones((1000,1))).shape                                                          
Out[54]: (1000, 1)
In [55]: (M3*np.ones((1000,1))).shape                                                          
Out[55]: (1000, 1)

這花了更長的時間； 我幾乎不敢看結果。

In [56]: (M2*np.ones((1000,1))).shape                                                          
Out[56]: (1000, 1)

如果我從對象數組中提取矩陣，則乘法很快

In [57]: (M2.item()*np.ones((1000,1))).shape                                                   
Out[57]: (1000, 1)
In [58]: (M2.item()*np.ones((1000,1))).dtype                                                   
Out[58]: dtype('float64')
In [59]: (M3*np.ones((1000,1))).dtype                                                          
Out[59]: dtype('float64')

更仔細地觀察M2乘法：

In [60]: (M2*np.ones((1000,1))).dtype                                                          
Out[60]: dtype('O')
In [61]: (M2*np.ones((1000,1)))[:2,:]                                                          
Out[61]: 
array([[<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 200000 stored elements in Compressed Sparse Row format>],
       [<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 200000 stored elements in Compressed Sparse Row format>]],
      dtype=object)

它執行M*1的每個元素相乘ones -使1000點稀疏矩陣。 這就是你的內存消耗的去向。

總之，當使用savez它將每個稀疏矩陣包裝在一個對象 dtype 數組中，並進行泡菜。 所以你不應該直接使用`data['Dx']

Dx = data['Dx']  # wrong
Dx = data['Dx'].item()    # right

Answer 2

我能夠通過使用scipy.io.savemat()和scipy.io.loadmat()來解決這個問題。 我之前使用過 np.savz() 和 np.load() ，這占用了太多內存。

Answer 3

該計算非常適合就地操作：

np.multiply(data['Dx'], u, out=data['Dx']
np.multiply(data['Dy'], v, out=data['Dy']
np.multiply(data['Dz'], w, out=data['Dz']
numpy.add(data['Dx'], data['Dy'], out=data['Dx'])
numpy.add(data['Dx'], data['Dz'], out=data['Dx'])

這不會創建額外的臨時數組。 如果您還避免加載此特定計算不需要的其他變量，您將節省額外的內存。 在函數內部進行工作是確保在工作完成后清理/釋放內存的好方法。 在您的情況下，支付讀取懲罰可能會更好，只需讀取每次此類計算所需的特定數據。 雖然就像 hpaulj 所說的，你能找到的魔法並不多； 這是一個大型數據集，需要大量內存。 有什么方法可以減少問題的大小/分辨率，或者在更小的塊中工作？

Python占用太多內存

問題描述

3 個解決方案

解決方案1
1 已采納 2020-02-01 21:10:06

解決方案2
0 2020-02-01 20:30:46

解決方案3
0 2020-02-01 21:12:37

Python占用太多內存

問題描述

3 個解決方案

解決方案1 1 已采納 2020-02-01 21:10:06

解決方案2 0 2020-02-01 20:30:46

解決方案3 0 2020-02-01 21:12:37

解決方案1
1 已采納 2020-02-01 21:10:06

解決方案2
0 2020-02-01 20:30:46

解決方案3
0 2020-02-01 21:12:37