简体   繁体   English

pandas:在groupby组内对观察进行排序

[英]pandas: sorting observations within groupby groups

According to the answer to pandas groupby sort within groups , in order to sort observations within each group one needs to do a second groupby on the results of the first groupby . 根据答案大熊猫GROUPBY排序组内 ,以每个人需要做的第二组内的观测排序groupby第一的成绩groupby Why a second groupby is needed? 为什么需要第二个groupby I would've assumed that observations are already arranged into groups after running the first groupby and all that would be needed is a way to enumerate those groups (and run apply with order ). 我会一直假定的观测已经分成多个组运行后的第一个groupby和所有将需要是列举这些群体(和运行方式applyorder )。

Because once you apply a function after a groupby the results are combined back into a normal ungrouped data frame. 因为一旦你在一个groupby之后应用一个函数,结果会被组合回一个普通的未组合数据框。 Using groupby and a groupby method like sort should be thought of like a Split-Apply-Combine operation 使用groupby和groupby方法(如sort)应该被认为是Split-Apply-Combine操作

The groupby splits the original data frame and the method is applied to each group, but then the results are combined again implicitly. groupby分割原始数据帧,并将该方法应用于每个组,但随后隐式地再次组合结果。

In that other question, they could have reversed the operation (sorted first) and then not have to use two groupbys. 在另一个问题中,他们可以颠倒操作(先排序),然后不必使用两个groupbys。 They could do: 他们可以这样做:

df.sort(['job','count'],ascending=False).groupby('job').head(3)

They need a second group by in that case, because on top of sorting, they want to keep only the top 3 rows of each group. 在这种情况下,他们需要第二组,因为除了排序之外,他们只想保留每组的前3行。

If you just need to sort after a group by you can do : 如果您只需要按照组进行排序,您可以执行以下操作:

df_res = df.groupby(['job','source']).agg({'count':sum}).sort_values(['job','count'],ascending=False)

One group by is enough. 一组足够了。

And if you want to keep the 3 rows with the highest count for each group, then you can group again and use the head() function : 如果你想保留每组最高计数的3行,那么你可以再次分组并使用head()函数:

df_res.groupby('job').head(3)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM