简体   繁体   English

在熊猫GroupBy对象上“应用”的替代方法

[英]Alternative to 'Apply' on Pandas GroupBy Object

So I have the following: 所以我有以下几点:

timeDiffFunc = lambda x: x['CP_EX_DT'] - x['CP_EX_DT'].shift(1)
exTimeDiff = assetGrp.apply(timeDiffFunc).fillna(0).reset_index(level=1)

But this uses so much memory that my system crashes (similar to the issue seen here: Memory leak in Pandas.groupby.apply()? ) 但这会占用大量内存,导致我的系统崩溃(类似于在这里看到的问题: Pandas.groupby.apply()中的内存泄漏?

My question is, how can I convert this to code that does not use the apply function? 我的问题是,如何将其转换为不使用apply函数的代码? I tried variations of: 我尝试了以下变化:

for i, (name,grp) in enumerate(assetGrp):
  grp = grp['CP_EX_DT'] - grp['CP_EX_DT'].shift(1)
exTimeDiff = assetGrp.fillna(0).reset_index(level=1)

but always received an error like: NotImplementedError: Index._join_level on non-unique index is not implemented when trying to merge the result back into the dataframe. 但始终会收到类似以下错误: NotImplementedError: Index._join_level on non-unique index is not implemented尝试将结果合并回数据NotImplementedError: Index._join_level on non-unique index is not implemented

Any advice would be greatly appreciated. 任何建议将不胜感激。

根据上面的Uvar的评论,assetGrp.diff()以较低的内存开销执行了相同的操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM