優化python / numpy中的矩陣寫入

Question

我目前正在嘗試優化一段代碼，其依據是我們計算並計算一堆值並將它們寫入矩陣。 計算順序無關緊要：

mat =  np.zeros((n, n))
mat.fill(MAX_VAL)
for i in xrange(0, smallerDim):
    for j in xrange(0,n):
        similarityVal = doACalculation(i,j, data, cache)
        mat[i][j] = abs(1.0 / (similarityVal + 1.0))

我分析了這段代碼，發現大約90％的時間都花在了將值寫回到矩陣中（最后一行）

我想知道進行這種類型的計算以優化寫入的最佳方法是什么。 我應該寫入中間緩沖區並復制整行等嗎？對於性能調優或numpy內部結構我一點都不了解。

編輯：doACalculation不是沒有副作用的函數。 它接收一些數據（假設這是一些python對象），還接收其寫入和讀取一些中間步驟的緩存。 我不確定是否可以輕松將其向量化。 我嘗試按照建議使用numpy.vectorize，但沒有發現天真的for循環有明顯的提速。 （我通過狀態變量傳入了其他數據）：

Answer 1

將其包裝在numba autojit中會大大提高性能。

def doACalculationVector(n, smallerDim):
    return np.ones((smallerDim, n)) + 1


def testVector():
    n = 1000
    smallerDim = 800
    mat =  np.zeros((n, n))
    mat.fill(10) 
    mat[:smallerDim] = abs(1.0 / (doACalculationVector(n, smallerDim) + 1.0))
    return mat

@numba.autojit
def doACalculationNumba(i,j):
    return 2

@numba.autojit
def testNumba():
    n = 1000
    smallerDim = 800
    mat =  np.zeros((n, n))
    mat.fill(10)
    for i in xrange(0, smallerDim):
        for j in xrange(0, n):
            mat[i,j] = abs(1.0 / (doACalculationNumba(i, j) + 1.0))
    return mat

供參考的原始時序：（將mat[i][j]更改為mat[i,j] ）

In [24]: %timeit test()
1 loops, best of 3: 226 ms per loop

現在我稍微簡化了功能，因為這就是所提供的全部。 但是testNumba的速度大約是定時測試的40倍 。 大約是向量化速度的3倍

In [20]: %timeit testVector()
100 loops, best of 3: 17.9 ms per loop

In [21]: %timeit testNumba()
100 loops, best of 3: 5.91 ms per loop

Answer 2

如果可以向量化doACalculation ，任務將變得容易：

similarityArray = doACalculation(np.indices((smallerDim, n)))
mat[:smallerDim] = np.abs(1.0 / (similarityArray + 1))

假設您正確地向量化了doACalculation ，這應該至少快一個數量級。 通常，在使用NumPy數組時，您要盡可能避免顯式循環和元素訪問。

作為參考，一個可能的doACalculation的示例矢量化：

# Unvectorized
def doACalculation(i, j):
    return i**2 + i*j + j

# Vectorized
def doACalculation(input):
    i, j = input
    return i**2 + i*j + j

# Vectorized, but with the original call signature
def doACalculation(i, j):
    return i**2 + i*j + j

是的，最后一個版本實際上應該與未向量化的功能相同。 有時候就是那么容易。

Answer 3

即使您無法向量化doACalculation() 。 您可以使用numpy.vectorize()加快計算速度。 這是測試。

import numpy as np
n = 1000
smallerDim = 500

def doACalculation(i, j):
    return i+j

對於循環版本：

%%timeit
mat =  np.zeros((n, n))

for i in xrange(0, smallerDim):
    for j in xrange(0,n):
        similarityVal = doACalculation(i,j)
        mat[i,j] = abs(1.0 / (similarityVal + 1.0))

輸出：

1 loops, best of 3: 183 ms per loop

vectorize()版本：

%%timeit
mat2 =  np.zeros((n, n))
i, j = np.ix_(np.arange(smallerDim), np.arange(n))
f = np.vectorize(doACalculation, "d")
mat2[:smallerDim] = np.abs(1.0/(f(i, j) + 1))

輸出：

10 loops, best of 3: 97.3 ms per loop

測試結果：

np.allclose(mat,mat2)

輸出：

True

此方法不會使doACalculation()調用速度更快，但可以使后續計算可以向量化。

優化python / numpy中的矩陣寫入

問題描述

3 個解決方案

解決方案1
4 2013-12-26 00:44:21

解決方案2
2 已采納 2013-12-26 00:41:32

解決方案3
1 2013-12-26 01:11:36

優化python / numpy中的矩陣寫入

問題描述

3 個解決方案

解決方案1 4 2013-12-26 00:44:21

解決方案2 2 已采納 2013-12-26 00:41:32

解決方案3 1 2013-12-26 01:11:36

解決方案1
4 2013-12-26 00:44:21

解決方案2
2 已采納 2013-12-26 00:41:32

解決方案3
1 2013-12-26 01:11:36