简体   繁体   English


[英]how to speed-up a very slow pandas apply function?

I have a very large pandas dataset, and at some point I need to use the following function 我有一个非常大的pandas数据集,在某些时候我需要使用以下函数

def proc_trader(data):
    data['_seq'] = np.nan
    # make every ending of a roundtrip with its index
    data.ix[data.cumq == 0,'tag'] = np.arange(1, (data.cumq == 0).sum() + 1)
    # backfill the roundtrip index until previous roundtrip;
    # then fill the rest with 0s (roundtrip incomplete for most recent trades)
    data['_seq'] =data['tag'].fillna(method = 'bfill').fillna(0)
    return data['_seq']
    # btw, why on earth this function returns a dataframe instead of the series `data['_seq']`??

and I use apply 我用申请


Obviously, I cannot share the data here, but do you see a bottleneck in my code? 显然,我不能在这里分享数据,但你看到我的代码中存在瓶颈吗? Could it be the arange thing? 它可能是一个arange东西吗? There are many name-productid combinations in the data. 数据中有许多name-productid组合。

Minimal Working Example: 最小工作范例:

import pandas as pd
import numpy as np

reshaped= pd.DataFrame({'trader' : ['a','a','a','a','a','a','a'],'stock' : ['a','a','a','a','a','a','b'], 'day' :[0,1,2,4,5,10,1],'delta':[10,-10,15,-10,-5,5,0] ,'out': [1,1,2,2,2,0,1]})

reshaped.sort_values(by=['trader', 'stock','day'], inplace=True)
reshaped['cumq']=reshaped.groupby(['trader', 'stock']).delta.transform('cumsum')

Nothing really fancy here, just tweaked in a couple of places. 这里没什么好看的,只是在几个地方调整过。 There is really no need to put in a function, so I didn't. 实际上没有必要输入功能,所以我没有。 On this tiny sample data, it's about twice as fast as the original. 在这个微小的样本数据上,它的速度大约是原始数据的两倍。

reshaped.sort_values(by=['trader', 'stock','day'], inplace=True)
reshaped['cumq']=reshaped.groupby(['trader', 'stock']).delta.cumsum()
reshaped.loc[ reshaped.cumq == 0, '_spell' ] = 1
reshaped['_spell'] = reshaped.groupby(['trader','stock'])['_spell'].cumsum()
reshaped['_spell'] = reshaped.groupby(['trader','stock'])['_spell'].bfill().fillna(0)

Result: 结果:

   day  delta  out stock trader  cumq  _spell
0    0     10    1     a      a    10     1.0
1    1    -10    1     a      a     0     1.0
2    2     15    2     a      a    15     2.0
3    4    -10    2     a      a     5     2.0
4    5     -5    2     a      a     0     2.0
5   10      5    0     a      a     5     0.0
6    1      0    1     b      a     0     1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM