简体   繁体   English

Python GroupBy sort 按分组内的列降序

[英]Python GroupBy sort Descending by column within grouping

I have a dataset with the following columns - ID, Old Stage, New Stage and Cycle Number.我有一个包含以下列的数据集 - ID、旧阶段、新阶段和周期编号。 Each ID has multiple rows (2+), depicting a series of back and forth between old and new stage;每个ID都有多行(2+),描绘了新旧之间的一系列来回阶段; this is detailed by the Cycle Number.这由周期编号详细说明。

I am trying to group multiple rows by ID (that's ok), but within that grouping I want to sort by Cycle Number.我正在尝试按 ID 对多行进行分组(没关系),但在该分组中我想按周期编号排序。 For eg if ID 1 has 6 cycles, I want cycle #6 to be listed first, then 5, 4, 3, etc.例如,如果 ID 1 有 6 个周期,我希望首先列出周期 #6,然后是 5、4、3 等。

grouped2 = df.groupby(['ID', 'Old_Stage', 'New_Stage'], as_index=False)['Cycle_Number'].max().sort_values(['Cycle_Number'], ascending=False)
print(grouped2)

This is what I tried, however, it only sorts the Cycle Numbers in descending order overall, not within the ID grouping .这是我尝试过的,但是,它仅按整体降序对 Cycle Numbers 进行排序,而不是在 ID grouping中。

EDIT编辑

Current dataframe:当前 dataframe:

|ID |Old Stage   |New Stage   |Cycle Number|
|100|In Progress |Under Review|1
|100|Not Started |In Progress |0
|100|Under Review|Completed   |2
|100|Completed   |In Progress |3

Desired dataframe:所需的 dataframe:

|ID |Old Stage   |New Stage   |Cycle Number|
|100|Completed   |In Progress |3
|   |Under Review|Completed   |2
|   |In Progress |Under Review|1
|   |Not Started |In Progress |0

As furas and jezrael mentioned, using pandas.DataFrame.sort_values , as follows, should solve OP's problem正如furasjezrael提到的,使用pandas.DataFrame.sort_values ,如下,应该解决OP的问题

df = df.sort_values(by=['ID', 'Cycle Number'], ascending=[True, False])

[Out]:
    ID     Old Stage     New Stage  Cycle Number
3  100     Completed   In Progress             3
2  100  Under Review     Completed             2
0  100   In Progress  Under Review             1
1  100   Not Started   In Progress             0

However, OP mentioned但是,OP提到

It doesn't keep it grouped by ID它不会按 ID 分组

It seems that OP is referring to the order of the index.似乎 OP 指的是索引的顺序。 As one can see on the output of the previous dataframe, it goes from 3, to 2, to 0, to 1, and, IIUC, OP wants it going from 0 to 1, to 2, and so on.正如人们在之前的 dataframe 的 output 上看到的那样,它从 3 变为 2,变为 0,变为 1,并且,IIUC,OP 希望它从 0 变为 1,变为 2,依此类推。

If that is the case, what is missing is just .reset_index(drop=True) as follows如果是这种情况,那么缺少的只是.reset_index(drop=True)如下

df = df.sort_values(by=['ID', 'Cycle Number'], ascending=[True, False]).reset_index(drop=True)

[Out]:
    ID     Old Stage     New Stage  Cycle Number
0  100     Completed   In Progress             3
1  100  Under Review     Completed             2
2  100   In Progress  Under Review             1
3  100   Not Started   In Progress             0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM