使用numpy數組優化python函數

Question

我一直在嘗試優化我過去兩天寫的python腳本。 使用幾個分析工具（cProfile，line_profiler等）我將問題縮小到下面的函數。

df是一個numpy數組，有3列和+1,000,000行（數據類型為float）。 使用line_profiler，我發現只要需要訪問numpy數組，函數就會花費大部分時間。

full_length += head + df[rnd_truck, 2]

和

full_weight += df[rnd_truck,1]

占用大部分時間，然后是

full_length = df[rnd_truck,2]

full_weight = df[rnd_truck,1]

線。

據我所知，瓶頸是由訪問時間引起的，該函數試圖從numpy數組中獲取一個數字。

當我以MonteCarlo(df, 15., 1000.)運行該功能時，在具有8GB RAM的i7 3.40GhZ 64位Windows機器上調用該功能需要37秒。 在我的應用程序中，我需要運行它1,000,000,000以確保收斂，這將執行時間超過一個小時。 我嘗試使用operator.add方法進行求和，但它根本沒有幫助我。 看起來我必須想出一種更快的方式來訪問這個numpy數組。

任何想法都會受到歡迎！

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]

    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]

    # Return average weight per feet on the bridge
    return(full_weight/span)

下面是我正在使用的df numpy數組的一部分：

In [31] df
Out[31]: 
array([[  12. ,  220.4,  108.4],
       [  11. ,  220.4,  106.2],
       [  11. ,  220.3,  113.6],
       ..., 
       [   4. ,   13.9,   36.8],
       [   3. ,   13.7,   33.9],
       [   3. ,   13.7,   10.7]])

Answer 1

正如其他人所指出的那樣，這根本不是矢量化的，所以你的緩慢實際上是由於Python解釋器的緩慢。 Cython可以通過最少的更改為您提供幫助：

>>> %timeit MonteCarlo(df, 5, 1000)
10000 loops, best of 3: 48 us per loop

>>> %timeit MonteCarlo_cy(df, 5, 1000)
100000 loops, best of 3: 3.67 us per loop

MonteCarlo_cy就在哪里（在IPython筆記本中，在%load_ext cythonmagic ）：

%%cython
import numpy as np
cimport numpy as np

def MonteCarlo_cy(double[:, ::1] df, double head, double span):
    # Pick initial truck
    cdef long n = df.shape[0]
    cdef long rnd_truck = np.random.randint(0, n)
    cdef double full_weight = df[rnd_truck, 1]
    cdef double full_length = df[rnd_truck, 2]

    # Loop using other random truck until the bridge is full
    while True:
        rnd_truck = np.random.randint(0, n)
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck, 1]

    # Return average weight per feet on the bridge
    return full_weight / span

Answer 2

使用cython編譯函數可以為運行時提供非常大的改進。

在一個名為“funcs.pyx”的單獨文件中，我有以下代碼：

cimport cython
import numpy as np
cimport numpy as np


def MonteCarlo(np.ndarray[np.float_t, ndim=2] df, float head, float span):
    # Pick initial truck
    cdef int rnd_truck = np.random.randint(0,len(df))
    cdef float full_length = df[rnd_truck,2]
    cdef float full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

除了變量前面的類型聲明外，一切都是一樣的。

這是我用來測試它的文件：

import numpy as np
import pyximport
pyximport.install(reload_support=True, setup_args={'include_dirs':[np.get_include()]})
import funcs

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

df = np.random.rand(1000000,3)
reload(funcs)
%timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
%timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]

我只跑了10000次，但即便如此，也有很大的進步。

16:42:30: In [31]: %timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
10 loops, best of 3: 131 ms per loop

16:42:37: In [32]: %timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]
1 loops, best of 3: 1.75 s per loop

Answer 3

需要指出的是，蒙特卡洛令人尷尬地平行。 無論你選擇哪種解決方案，你都應該做一些事情來並行化。 使用@ Dougal的答案。

from multiprocessing import Pool

def RunVMC(n):
    return MonteCarlo_cy(df,head,span)


pool=Pool(processes=4)

%timeit [MonteCarlo_cy(df,15,1000) for x in range(1000000)]
1 loops, best of 3: 3.89 s per loop

#Pool @ 4
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 0.973 s per loop

#Pool @ 8
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 568 ms per loop

Answer 4

您可以嘗試切換到另一個Python變體。 Jython的是比Python快一點，而且在某些情況下， PyPy是快了很多 。 給他們兩個嘗試。

使用numpy數組優化python函數

問題描述

4 個解決方案

解決方案1
3 已采納 2013-08-08 20:29:01

解決方案2
2 2013-08-08 20:53:21

解決方案3
2 2013-08-08 21:05:50

解決方案4
0 2013-08-08 19:49:02

使用numpy數組優化python函數

問題描述

4 個解決方案

解決方案1 3 已采納 2013-08-08 20:29:01

解決方案2 2 2013-08-08 20:53:21

解決方案3 2 2013-08-08 21:05:50

解決方案4 0 2013-08-08 19:49:02

解決方案1
3 已采納 2013-08-08 20:29:01

解決方案2
2 2013-08-08 20:53:21

解決方案3
2 2013-08-08 21:05:50

解決方案4
0 2013-08-08 19:49:02