简体   繁体   English

groupby的groupby以选择熊猫中的值

[英]groupby of a groupby to select values in pandas

I have a data frame as follows: 我有一个数据框,如下所示:

marker    date         value       identifier

EA    2007-01-01      0.33            55
EA    2007-01-01      0.73            56
EA    2007-01-01      0.51            57
EA    2007-02-01      0.13            55
EA    2007-02-01      0.23            57
EA    2007-03-01      0.82            55
EA    2007-03-01      0.88            56
EB    2007-01-01      0.13            45
EB    2007-01-01      0.74            46
EB    2007-01-01      0.56            47
EB    2007-02-01      0.93            45
EB    2007-02-01      0.23            47
EB    2007-03-01      0.82            45
EB    2007-03-01      0.38            46
EB    2007-03-01      0.19            47

Now I want to do a selection on this data frame by value, so I use 现在,我想按值对此数据帧进行选择,所以我使用

df.groupby(marker).get_group('EA')

But I also want to get the mean of the value, and notice that I have a duplicated date index, so now I have to do two groupbys because the index is different, leading to 但是我也想获取值的平均值,并注意我有一个重复的日期索引,所以现在我必须做两个 groupby,因为索引不同,导致

df.groupby(marker).get_group('EA').groupby(df.groupby(marker).get_group('EA').index.date).mean()['value'].plot()

what clearly is not really legible. 显然不是很清楚。 How can I accomplish this without creating a intermediary variable? 如何在不创建中介变量的情况下完成此任务?

You can't, for the reason you wrote above in your comment about the AssertionError . 由于上面您在有关AssertionError的评论中所写的原因,您不能这样做。 Pandas expects to do the (second) groupby according to some sequence which has exactly the same length as the DataFrame getting grouped. 大熊猫期望做的(第二) groupby根据其具有完全按相同的长度的一些序列DataFrame得到分组。 If you're unwilling to first create a DataFrame describing the EA values, you're basically stuck with creating it again on the fly. 如果您不愿意首先创建一个描述EA值的DataFrame ,则基本上可以随时进行重新创建。

Not only is that less legible, it is unnecessarily expensive. 这不仅不那么清晰,而且不必要地昂贵。 Speaking of which, I'd rewrite your code like this: 说到这,我将像这样重写您的代码:

eas = df[df.marker == 'EA']
eas.value.groupby(eas.date).mean().plot();

Doing a groupby and retaining a single group is a very expensive way of just filtering according to the key. 做一个groupby和保持一个组是由密钥只是过滤一个非常昂贵的方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM