簡體   English   中英

使用numpy數組優化python函數

[英]Optimizing a python function with numpy arrays

我一直在嘗試優化我過去兩天寫的python腳本。 使用幾個分析工具(cProfile,line_profiler等)我將問題縮小到下面的函數。

df是一個numpy數組,有3列和+1,000,000行(數據類型為float)。 使用line_profiler,我發現只要需要訪問numpy數組,函數就會花費大部分時間。

full_length += head + df[rnd_truck, 2]

full_weight += df[rnd_truck,1]

占用大部分時間,然后是

full_length = df[rnd_truck,2]

full_weight = df[rnd_truck,1]

線。

據我所知,瓶頸是由訪問時間引起的,該函數試圖從numpy數組中獲取一個數字。

當我以MonteCarlo(df, 15., 1000.)運行該功能時,在具有8GB RAM的i7 3.40GhZ 64位Windows機器上調用該功能需要37秒。 在我的應用程序中,我需要運行它1,000,000,000以確保收斂,這將執行時間超過一個小時。 我嘗試使用operator.add方法進行求和,但它根本沒有幫助我。 看起來我必須想出一種更快的方式來訪問這個numpy數組。

任何想法都會受到歡迎!

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]

    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]

    # Return average weight per feet on the bridge
    return(full_weight/span)

下面是我正在使用的df numpy數組的一部分:

In [31] df
Out[31]: 
array([[  12. ,  220.4,  108.4],
       [  11. ,  220.4,  106.2],
       [  11. ,  220.3,  113.6],
       ..., 
       [   4. ,   13.9,   36.8],
       [   3. ,   13.7,   33.9],
       [   3. ,   13.7,   10.7]])

正如其他人所指出的那樣,這根本不是矢量化的,所以你的緩慢實際上是由於Python解釋器的緩慢。 Cython可以通過最少的更改為您提供幫助:

>>> %timeit MonteCarlo(df, 5, 1000)
10000 loops, best of 3: 48 us per loop

>>> %timeit MonteCarlo_cy(df, 5, 1000)
100000 loops, best of 3: 3.67 us per loop

MonteCarlo_cy就在哪里(在IPython筆記本中,在%load_ext cythonmagic ):

%%cython
import numpy as np
cimport numpy as np

def MonteCarlo_cy(double[:, ::1] df, double head, double span):
    # Pick initial truck
    cdef long n = df.shape[0]
    cdef long rnd_truck = np.random.randint(0, n)
    cdef double full_weight = df[rnd_truck, 1]
    cdef double full_length = df[rnd_truck, 2]

    # Loop using other random truck until the bridge is full
    while True:
        rnd_truck = np.random.randint(0, n)
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck, 1]

    # Return average weight per feet on the bridge
    return full_weight / span

使用cython編譯函數可以為運行時提供非常大的改進。

在一個名為“funcs.pyx”的單獨文件中,我有以下代碼:

cimport cython
import numpy as np
cimport numpy as np


def MonteCarlo(np.ndarray[np.float_t, ndim=2] df, float head, float span):
    # Pick initial truck
    cdef int rnd_truck = np.random.randint(0,len(df))
    cdef float full_length = df[rnd_truck,2]
    cdef float full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

除了變量前面的類型聲明外,一切都是一樣的。

這是我用來測試它的文件:

import numpy as np
import pyximport
pyximport.install(reload_support=True, setup_args={'include_dirs':[np.get_include()]})
import funcs

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

df = np.random.rand(1000000,3)
reload(funcs)
%timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
%timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]

我只跑了10000次,但即便如此,也有很大的進步。

16:42:30: In [31]: %timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
10 loops, best of 3: 131 ms per loop

16:42:37: In [32]: %timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]
1 loops, best of 3: 1.75 s per loop

需要指出的是,蒙特卡洛令人尷尬地平行。 無論你選擇哪種解決方案,你都應該做一些事情來並行化。 使用@ Dougal的答案。

from multiprocessing import Pool

def RunVMC(n):
    return MonteCarlo_cy(df,head,span)


pool=Pool(processes=4)

%timeit [MonteCarlo_cy(df,15,1000) for x in range(1000000)]
1 loops, best of 3: 3.89 s per loop

#Pool @ 4
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 0.973 s per loop

#Pool @ 8
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 568 ms per loop

您可以嘗試切換到另一個Python變體。 Jython的是比Python快一點,而且在某些情況下, PyPy是快了很多 給他們兩個嘗試。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM