简体   繁体   English

如何矢量化熊猫 iterrows 循环

[英]How do I vectorize a pandas iterrows loop

I have a large dataframe and I need to loop through it.我有一个大数据框,我需要遍历它。 However, it takes a long time for a very large dataframe.但是,对于非常大的数据帧需要很长时间。 I know iterrows is quiet slow and vectorization much faster.我知道 iterrows 很慢,矢量化速度要快得多。 However, I don't know how to rewrite an iterrows loop.但是,我不知道如何重写 iterrows 循环。

My dataframe is given as follows:我的数据框如下:

print(df_toe.head(10))

 z_toe  dn50_toe  Nod  ht/h  output_ok
0   -3.5  0.067171  NaN   NaN        1.0
1   -3.5  0.082472  NaN   NaN        1.0
2   -3.5  0.095543  NaN   NaN        1.0
3   -3.5  0.196341  NaN   NaN        1.0
4   -3.5  0.232024  NaN   NaN        1.0
5   -3.5  0.347270  NaN   NaN        1.0
6   -3.5  0.353661  NaN   NaN        1.0
7   -3.5  0.404841  NaN   NaN        1.0
8   -3.5  0.632502  NaN   NaN        1.0
9   -3.5  0.922923  NaN   NaN        1.0

With some extra parameters:有一些额外的参数:

z_bed = -4.5 
swl = 1.8

The iterrows loop through the dataframe df_toe is written as follows:通过数据帧 df_toe 的 iterrows 循环编写如下:

def dftoe_det_2nd(df_toe):

    for i in df_toe.index:
        'Define input variables'
        z_toe = df_toe.get_value(i,'z_toe')
        dn50_toe = df_toe.get_value(i,'dn50_toe')

        'Define restrictions between which it can operate for z_toe/h'
        h = swl - z_bed
        ht = swl - z_toe
        df_toe.set_value(i,'ht/h',abs(ht / h))

        if z_toe < z_bed:
            df_toe.set_value(i,'output_ok',0)

        'Show all waterheights'
        df_toe.set_value(i,'Nod',Nodtoe())

        if 0.90 < abs(ht / h) or 0.4 > abs(ht / h):
            df_toe.set_value(i,'output_ok',0)

        if h > 25:
            df_toe.set_value(i,'output_ok',0)

    df_toe = df_toe[df_toe['output_ok'] == 1]
    del df_toe['output_ok']
    return df_toe

Does anyone know how this can be optimized in the sense of velocity and computation time?有谁知道如何在速度和计算时间方面对其进行优化?

You can follow https://stackoverflow.com/a/28490706/3528612 and try openmp over the loop.您可以关注https://stackoverflow.com/a/28490706/3528612并在循环中尝试 openmp。 Or if you have the resources, ie more processors you can try mpi4py and parallelize the loop into small chunks to make this faster或者,如果您有资源,即更多处理器,您可以尝试使用 mpi4py 并将循环并行化为小块以使其更快

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM