简体   繁体   English

使用 lambda 和 diff 替代 pandas groupby

[英]Alternative to pandas groupby with lambda and diff

Assume I have df below:假设我在下面有df

    ID  V
0   A   1
1   A   2
2   B   4
3   B   3

And the desired output is:所需的 output 是:

    V
0   NaN
1   1.0
2   NaN
3   -1.0

This can be done using groupby and lambda with diff :这可以使用groupbylambdadiff来完成:

df.groupby('ID').apply(lambda x: x.diff())

I am trying to come up with a solution that doesn't rely on lambda as this quickly becomes very slow.我正在尝试提出一个不依赖于lambda的解决方案,因为这很快就会变得非常慢。 Any ideas?有任何想法吗?

UPDATE更新

Performance comparison between (1) using groupby , lambda and diff , and, (2) only using groupby and diff : (1) 使用groupbylambdadiff与 (2) 仅使用groupbydiff之间的性能比较:

1 1

3.67 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

2 2

2.42 ms ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Use .agg and pass diff使用.agg并传递diff

 df.groupby('ID')['V'].agg('diff')

0    NaN
1    1.0
2    NaN
3   -1.0

Well, in this case, groupby objects directly support diff :好吧,在这种情况下, groupby 对象直接支持diff

>>> df
  ID  V
0  A  1
1  A  2
2  B  4
3  B  3
>>> df.groupby('ID').diff()
     V
0  NaN
1  1.0
2  NaN
3 -1.0
>>>

But I'm not sure if this will actually improve your performance.但我不确定这是否真的会提高你的表现。 Using .apply on columns, ie across the first axis, shouldn't be slower than the above, it is basically equivalent (unlike .apply ing on the rows).在列上使用.apply ,即在第一个轴上,不应该比上面慢,它基本上是等价的(不像.apply上使用)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM