![](/img/trans.png)
[英]optimizing indexing and retrieval of elements in numpy arrays in Python?
[英]Optimizing a python function with numpy arrays
我一直在嘗試優化我過去兩天寫的python腳本。 使用幾個分析工具(cProfile,line_profiler等)我將問題縮小到下面的函數。
df
是一個numpy數組,有3列和+1,000,000行(數據類型為float)。 使用line_profiler,我發現只要需要訪問numpy數組,函數就會花費大部分時間。
full_length += head + df[rnd_truck, 2]
和
full_weight += df[rnd_truck,1]
占用大部分時間,然后是
full_length = df[rnd_truck,2]
full_weight = df[rnd_truck,1]
線。
據我所知,瓶頸是由訪問時間引起的,該函數試圖從numpy數組中獲取一個數字。
當我以MonteCarlo(df, 15., 1000.)
運行該功能時,在具有8GB RAM的i7 3.40GhZ 64位Windows機器上調用該功能需要37秒。 在我的應用程序中,我需要運行它1,000,000,000以確保收斂,這將執行時間超過一個小時。 我嘗試使用operator.add
方法進行求和,但它根本沒有幫助我。 看起來我必須想出一種更快的方式來訪問這個numpy數組。
任何想法都會受到歡迎!
def MonteCarlo(df,head,span):
# Pick initial truck
rnd_truck = np.random.randint(0,len(df))
full_length = df[rnd_truck,2]
full_weight = df[rnd_truck,1]
# Loop using other random truck until the bridge is full
while 1:
rnd_truck = np.random.randint(0,len(df))
full_length += head + df[rnd_truck, 2]
if full_length > span:
break
else:
full_weight += df[rnd_truck,1]
# Return average weight per feet on the bridge
return(full_weight/span)
下面是我正在使用的df
numpy數組的一部分:
In [31] df
Out[31]:
array([[ 12. , 220.4, 108.4],
[ 11. , 220.4, 106.2],
[ 11. , 220.3, 113.6],
...,
[ 4. , 13.9, 36.8],
[ 3. , 13.7, 33.9],
[ 3. , 13.7, 10.7]])
正如其他人所指出的那樣,這根本不是矢量化的,所以你的緩慢實際上是由於Python解釋器的緩慢。 Cython可以通過最少的更改為您提供幫助:
>>> %timeit MonteCarlo(df, 5, 1000)
10000 loops, best of 3: 48 us per loop
>>> %timeit MonteCarlo_cy(df, 5, 1000)
100000 loops, best of 3: 3.67 us per loop
MonteCarlo_cy
就在哪里(在IPython筆記本中,在%load_ext cythonmagic
):
%%cython
import numpy as np
cimport numpy as np
def MonteCarlo_cy(double[:, ::1] df, double head, double span):
# Pick initial truck
cdef long n = df.shape[0]
cdef long rnd_truck = np.random.randint(0, n)
cdef double full_weight = df[rnd_truck, 1]
cdef double full_length = df[rnd_truck, 2]
# Loop using other random truck until the bridge is full
while True:
rnd_truck = np.random.randint(0, n)
full_length += head + df[rnd_truck, 2]
if full_length > span:
break
else:
full_weight += df[rnd_truck, 1]
# Return average weight per feet on the bridge
return full_weight / span
使用cython編譯函數可以為運行時提供非常大的改進。
在一個名為“funcs.pyx”的單獨文件中,我有以下代碼:
cimport cython
import numpy as np
cimport numpy as np
def MonteCarlo(np.ndarray[np.float_t, ndim=2] df, float head, float span):
# Pick initial truck
cdef int rnd_truck = np.random.randint(0,len(df))
cdef float full_length = df[rnd_truck,2]
cdef float full_weight = df[rnd_truck,1]
# Loop using other random truck until the bridge is full
while 1:
rnd_truck = np.random.randint(0,len(df))
full_length += head + df[rnd_truck, 2]
if full_length > span:
break
else:
full_weight += df[rnd_truck,1]
# Return average weight per feet on the bridge
return(full_weight/span)
除了變量前面的類型聲明外,一切都是一樣的。
這是我用來測試它的文件:
import numpy as np
import pyximport
pyximport.install(reload_support=True, setup_args={'include_dirs':[np.get_include()]})
import funcs
def MonteCarlo(df,head,span):
# Pick initial truck
rnd_truck = np.random.randint(0,len(df))
full_length = df[rnd_truck,2]
full_weight = df[rnd_truck,1]
# Loop using other random truck until the bridge is full
while 1:
rnd_truck = np.random.randint(0,len(df))
full_length += head + df[rnd_truck, 2]
if full_length > span:
break
else:
full_weight += df[rnd_truck,1]
# Return average weight per feet on the bridge
return(full_weight/span)
df = np.random.rand(1000000,3)
reload(funcs)
%timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
%timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]
我只跑了10000次,但即便如此,也有很大的進步。
16:42:30: In [31]: %timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
10 loops, best of 3: 131 ms per loop
16:42:37: In [32]: %timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]
1 loops, best of 3: 1.75 s per loop
需要指出的是,蒙特卡洛令人尷尬地平行。 無論你選擇哪種解決方案,你都應該做一些事情來並行化。 使用@ Dougal的答案。
from multiprocessing import Pool
def RunVMC(n):
return MonteCarlo_cy(df,head,span)
pool=Pool(processes=4)
%timeit [MonteCarlo_cy(df,15,1000) for x in range(1000000)]
1 loops, best of 3: 3.89 s per loop
#Pool @ 4
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 0.973 s per loop
#Pool @ 8
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 568 ms per loop
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.