如何矢量化熊猫 iterrows 循环

Question

I have a large dataframe and I need to loop through it.我有一个大数据框，我需要遍历它。 However, it takes a long time for a very large dataframe.但是，对于非常大的数据帧需要很长时间。 I know iterrows is quiet slow and vectorization much faster.我知道 iterrows 很慢，矢量化速度要快得多。 However, I don't know how to rewrite an iterrows loop.但是，我不知道如何重写 iterrows 循环。

My dataframe is given as follows:我的数据框如下：

print(df_toe.head(10))

 z_toe  dn50_toe  Nod  ht/h  output_ok
0   -3.5  0.067171  NaN   NaN        1.0
1   -3.5  0.082472  NaN   NaN        1.0
2   -3.5  0.095543  NaN   NaN        1.0
3   -3.5  0.196341  NaN   NaN        1.0
4   -3.5  0.232024  NaN   NaN        1.0
5   -3.5  0.347270  NaN   NaN        1.0
6   -3.5  0.353661  NaN   NaN        1.0
7   -3.5  0.404841  NaN   NaN        1.0
8   -3.5  0.632502  NaN   NaN        1.0
9   -3.5  0.922923  NaN   NaN        1.0

With some extra parameters:有一些额外的参数：

z_bed = -4.5 
swl = 1.8

The iterrows loop through the dataframe df_toe is written as follows:通过数据帧 df_toe 的 iterrows 循环编写如下：

def dftoe_det_2nd(df_toe):

    for i in df_toe.index:
        'Define input variables'
        z_toe = df_toe.get_value(i,'z_toe')
        dn50_toe = df_toe.get_value(i,'dn50_toe')

        'Define restrictions between which it can operate for z_toe/h'
        h = swl - z_bed
        ht = swl - z_toe
        df_toe.set_value(i,'ht/h',abs(ht / h))

        if z_toe < z_bed:
            df_toe.set_value(i,'output_ok',0)

        'Show all waterheights'
        df_toe.set_value(i,'Nod',Nodtoe())

        if 0.90 < abs(ht / h) or 0.4 > abs(ht / h):
            df_toe.set_value(i,'output_ok',0)

        if h > 25:
            df_toe.set_value(i,'output_ok',0)

    df_toe = df_toe[df_toe['output_ok'] == 1]
    del df_toe['output_ok']
    return df_toe

Does anyone know how this can be optimized in the sense of velocity and computation time?有谁知道如何在速度和计算时间方面对其进行优化？

Answer 1

You can follow https://stackoverflow.com/a/28490706/3528612 and try openmp over the loop.您可以关注https://stackoverflow.com/a/28490706/3528612并在循环中尝试 openmp。 Or if you have the resources, ie more processors you can try mpi4py and parallelize the loop into small chunks to make this faster或者，如果您有资源，即更多处理器，您可以尝试使用 mpi4py 并将循环并行化为小块以使其更快

如何矢量化熊猫 iterrows 循环

问题描述

1 个解决方案

解决方案1
0 2019-08-15 11:10:18

如何矢量化熊猫 iterrows 循环

问题描述

1 个解决方案

解决方案1 0 2019-08-15 11:10:18

解决方案1
0 2019-08-15 11:10:18