[英]How do I vectorize a pandas iterrows loop
I have a large dataframe and I need to loop through it.我有一个大数据框,我需要遍历它。 However, it takes a long time for a very large dataframe.
但是,对于非常大的数据帧需要很长时间。 I know iterrows is quiet slow and vectorization much faster.
我知道 iterrows 很慢,矢量化速度要快得多。 However, I don't know how to rewrite an iterrows loop.
但是,我不知道如何重写 iterrows 循环。
My dataframe is given as follows:我的数据框如下:
print(df_toe.head(10))
z_toe dn50_toe Nod ht/h output_ok
0 -3.5 0.067171 NaN NaN 1.0
1 -3.5 0.082472 NaN NaN 1.0
2 -3.5 0.095543 NaN NaN 1.0
3 -3.5 0.196341 NaN NaN 1.0
4 -3.5 0.232024 NaN NaN 1.0
5 -3.5 0.347270 NaN NaN 1.0
6 -3.5 0.353661 NaN NaN 1.0
7 -3.5 0.404841 NaN NaN 1.0
8 -3.5 0.632502 NaN NaN 1.0
9 -3.5 0.922923 NaN NaN 1.0
With some extra parameters:有一些额外的参数:
z_bed = -4.5
swl = 1.8
The iterrows loop through the dataframe df_toe is written as follows:通过数据帧 df_toe 的 iterrows 循环编写如下:
def dftoe_det_2nd(df_toe):
for i in df_toe.index:
'Define input variables'
z_toe = df_toe.get_value(i,'z_toe')
dn50_toe = df_toe.get_value(i,'dn50_toe')
'Define restrictions between which it can operate for z_toe/h'
h = swl - z_bed
ht = swl - z_toe
df_toe.set_value(i,'ht/h',abs(ht / h))
if z_toe < z_bed:
df_toe.set_value(i,'output_ok',0)
'Show all waterheights'
df_toe.set_value(i,'Nod',Nodtoe())
if 0.90 < abs(ht / h) or 0.4 > abs(ht / h):
df_toe.set_value(i,'output_ok',0)
if h > 25:
df_toe.set_value(i,'output_ok',0)
df_toe = df_toe[df_toe['output_ok'] == 1]
del df_toe['output_ok']
return df_toe
Does anyone know how this can be optimized in the sense of velocity and computation time?有谁知道如何在速度和计算时间方面对其进行优化?
You can follow https://stackoverflow.com/a/28490706/3528612 and try openmp over the loop.您可以关注https://stackoverflow.com/a/28490706/3528612并在循环中尝试 openmp。 Or if you have the resources, ie more processors you can try mpi4py and parallelize the loop into small chunks to make this faster
或者,如果您有资源,即更多处理器,您可以尝试使用 mpi4py 并将循环并行化为小块以使其更快
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.