简体   繁体   English

熊猫groupby.diff()未返回预期输出

[英]Pandas groupby.diff() not returning expected output

I have an outer group and an inner group and I wish to find the difference within each inner group depending on the outer group. 我有一个外部组和一个内部组,我希望根据外部组找到每个内部组之间的差异。 Normally, I can nest the inner group within each outer group using groupby but, for some reason, the diff function for groupby returns a flat vector instead of a nested array. 通常,我可以使用groupby将内部组嵌套在每个外部组中,但是由于某些原因, groupbydiff函数将返回平面向量,而不是嵌套数组。

df = pd.DataFrame({'inner':list('aabbccddee'),'outer':[0,0,1,1,0,0,1,1,0,0],
    'value':np.random.randint(0,100,10)})

    inner  outer  value
0     a      0     78
1     a      0     68
2     b      1     78
3     b      1     22
4     c      0     53
5     c      0     25
6     d      1     82
7     d      1     38
8     e      0      2
9     e      0     39

If I desire the sum , for example, for the inner group for each outer group, I simply use groupby : 例如,如果我想要每个外部组的内部组的sum ,则只需使用groupby

In [19]: df.groupby(['outer','inner']).sum()
Out[19]:
             value
outer inner
0     a        146
      c         78
      e         41
1     b        100
      d        120

The above is the correct output and it works for all other functions except diff . 以上是正确的输出,它对diff 以外的所有其他功能均有效。 When I use diff , I want output in a format similar to the above but instead, I get: 当我使用diff ,我希望以与上述类似的格式输出,但是得到:

In [20]: df.groupby(['outer','inner']).diff()
Out[20]:
   value
0    NaN
1  -10.0
2    NaN
3  -56.0
4    NaN
5  -28.0
6    NaN
7  -44.0
8    NaN
9   37.0

The above is equivalent to df.groupby(['inner']).value.diff() so it seems groupby is not considering the outer group. 以上等效于df.groupby(['inner']).value.diff()因此groupby似乎没有考虑外部组。 I can find workouts for this no problem but using groupby for this would be more elegant and succinct. 我可以找到没有问题的锻炼方法,但是使用groupby会更优雅和简洁。 Does anyone know why this is happening and how it could be remedied? 有谁知道这是为什么发生以及如何补救?

Functions like s.diff() , cumsum etc are non aggregation function hence you would get the result in shape of a series, you could use np.diff() here, example below: 诸如s.diff()cumsum等函数是非聚合函数,因此您将获得一系列形状的结果,可以在此处使用np.diff() ,例如以下示例:

print(df.groupby(['outer','inner'])['value'].apply(lambda x: np.diff(x).item()))

outer  inner
0      a       -10
       c       -28
       e        37
1      b       -56
       d       -44

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM