在熊猫GroupBy对象上“应用”的替代方法

Question

So I have the following: 所以我有以下几点：

timeDiffFunc = lambda x: x['CP_EX_DT'] - x['CP_EX_DT'].shift(1)
exTimeDiff = assetGrp.apply(timeDiffFunc).fillna(0).reset_index(level=1)

But this uses so much memory that my system crashes (similar to the issue seen here: Memory leak in Pandas.groupby.apply()? ) 但这会占用大量内存，导致我的系统崩溃（类似于在这里看到的问题： Pandas.groupby.apply（）中的内存泄漏？）

My question is, how can I convert this to code that does not use the apply function? 我的问题是，如何将其转换为不使用apply函数的代码？ I tried variations of: 我尝试了以下变化：

for i, (name,grp) in enumerate(assetGrp):
  grp = grp['CP_EX_DT'] - grp['CP_EX_DT'].shift(1)
exTimeDiff = assetGrp.fillna(0).reset_index(level=1)

but always received an error like: NotImplementedError: Index._join_level on non-unique index is not implemented when trying to merge the result back into the dataframe. 但始终会收到类似以下错误： NotImplementedError: Index._join_level on non-unique index is not implemented尝试将结果合并回数据NotImplementedError: Index._join_level on non-unique index is not implemented 。

Any advice would be greatly appreciated. 任何建议将不胜感激。

Answer 1

根据上面的Uvar的评论，assetGrp.diff（）以较低的内存开销执行了相同的操作。

在熊猫GroupBy对象上“应用”的替代方法

问题描述

1 个解决方案

解决方案1
0 2017-08-11 17:28:41

在熊猫GroupBy对象上“应用”的替代方法

问题描述

1 个解决方案

解决方案1 0 2017-08-11 17:28:41

解决方案1
0 2017-08-11 17:28:41