使用 lambda 和 diff 替代 pandas groupby

Question

Assume I have df below:假设我在下面有df ：

And the desired output is:所需的 output 是：

    V
0   NaN
1   1.0
2   NaN
3   -1.0

This can be done using groupby and lambda with diff :这可以使用groupby和lambda和diff来完成：

df.groupby('ID').apply(lambda x: x.diff())

I am trying to come up with a solution that doesn't rely on lambda as this quickly becomes very slow.我正在尝试提出一个不依赖于lambda的解决方案，因为这很快就会变得非常慢。 Any ideas?有任何想法吗？

UPDATE更新

Performance comparison between (1) using groupby , lambda and diff , and, (2) only using groupby and diff : (1) 使用groupby 、 lambda和diff与 (2) 仅使用groupby和diff之间的性能比较：

1 1

3.67 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

2 2

2.42 ms ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 1

Use .agg and pass diff使用.agg并传递diff

 df.groupby('ID')['V'].agg('diff')

0    NaN
1    1.0
2    NaN
3   -1.0

Answer 2

Well, in this case, groupby objects directly support diff :好吧，在这种情况下， groupby 对象直接支持diff ：

>>> df
  ID  V
0  A  1
1  A  2
2  B  4
3  B  3
>>> df.groupby('ID').diff()
     V
0  NaN
1  1.0
2  NaN
3 -1.0
>>>

But I'm not sure if this will actually improve your performance.但我不确定这是否真的会提高你的表现。 Using .apply on columns, ie across the first axis, shouldn't be slower than the above, it is basically equivalent (unlike .apply ing on the rows).在列上使用.apply ，即在第一个轴上，不应该比上面慢，它基本上是等价的（不像.apply上使用）。

使用 lambda 和 diff 替代 pandas groupby

问题描述

2 个解决方案

解决方案1
2 2020-07-22 13:21:22

解决方案2
2 已采纳 2020-07-22 13:21:30

使用 lambda 和 diff 替代 pandas groupby

问题描述

2 个解决方案

解决方案1 2 2020-07-22 13:21:22

解决方案2 2 已采纳 2020-07-22 13:21:30

解决方案1
2 2020-07-22 13:21:22

解决方案2
2 已采纳 2020-07-22 13:21:30