在保留原始行的同时对数据框组进行排序

Question

Here is the original dataframe:这是原始数据框：

Name Version Cost
A    0.0.3   1.7
C    0.0.2   2.5
A    0.0.1   1.0
C    0.0.1   2.4
B    0.0.2   3.7
B    0.0.1   3.5
A    0.0.2   1.4
C    0.0.3   2.6
B    0.0.3   3.8

After grouping and sorting within groups using the following code:使用以下代码在组内进行分组和排序后：

df = df.sort_values(['Name', 'Version'], ascending=[True, False])
df = df.groupby(['Name'], sort=False)
df = df.apply(lambda x: x.sort_values(['Cost'], ascending=False))

Now I have this dataframe where the Cost is sorted within group and groups ordered alphabetically.现在我有了这个数据框，其中成本在组内排序，并按字母顺序排列。

     Name Version Cost
Name
A    A    0.0.3   1.7
     A    0.0.2   1.4
     A    0.0.1   1.0
B    B    0.0.3   3.8
     B    0.0.2   3.7
     B    0.0.1   3.5
C    C    0.0.3   2.6
     C    0.0.2   2.5
     C    0.0.1   2.4

Question is, now I would like to sort the groups by total cost of each group so the expected the result looks like this:问题是，现在我想按每个组的总成本对组进行排序，因此预期的结果如下所示：

Name Version Cost
B    0.0.3   3.8
B    0.0.2   3.7
B    0.0.1   3.5
C    0.0.3   2.6
C    0.0.2   2.5
C    0.0.1   2.4
A    0.0.3   1.7
A    0.0.2   1.4
A    0.0.1   1.0

How can I achieve it with without losing the rows.我怎样才能在不丢失行的情况下实现它。

Answer 1

You can create temporary column and sort by it.您可以创建临时列并按它排序。 Then drop that column:然后删除该列：

df["tmp"] = df.groupby("Name")["Cost"].transform("sum")
df = df.sort_values(by="tmp", ascending=False).drop("tmp", 1)
print(df)

Prints:印刷：

  Name Version  Cost
3    B   0.0.3   3.8
4    B   0.0.2   3.7
5    B   0.0.1   3.5
6    C   0.0.3   2.6
7    C   0.0.2   2.5
8    C   0.0.1   2.4
0    A   0.0.3   1.7
1    A   0.0.2   1.4
2    A   0.0.1   1.0

df used: df使用：

Name Version Cost
A    0.0.3   1.7
A    0.0.2   1.4
A    0.0.1   1.0
B    0.0.3   3.8
B    0.0.2   3.7
B    0.0.1   3.5
C    0.0.3   2.6
C    0.0.2   2.5
C    0.0.1   2.4

Answer 2

Starting from your original dataframe, you can generate a helper column of group sums with transform and sort according to that and also Version column both in descending order:从您的原始数据帧开始，您可以生成一个组总和的辅助列，并根据该列进行transform和排序，以及按降序排列的Version列：

group_sums = df.groupby("Name").Cost.transform("sum")
out = (df.assign(sorter=group_sums)
         .sort_values(["sorter", "Version"], ascending=False, ignore_index=True)
         .drop(columns="sorter"))

where we drop the helper column sorter after sorting,我们在排序后删除辅助列sorter ，

to get要得到

>>> out

  Name Version  Cost
0    B   0.0.3   3.8
1    B   0.0.2   3.7
2    B   0.0.1   3.5
3    C   0.0.3   2.6
4    C   0.0.2   2.5
5    C   0.0.1   2.4
6    A   0.0.3   1.7
7    A   0.0.2   1.4
8    A   0.0.1   1.0

Answer 3

You can use the key argument in sort_values to achieve the same result as the other answers:您可以使用sort_values的key参数来获得与其他答案相同的结果：

print (df.sort_values("Cost", ascending=False,
                      key=lambda _: df.groupby("Name")["Cost"].transform("sum")))

  Name Version  Cost
3    B   0.0.3   3.8
4    B   0.0.2   3.7
5    B   0.0.1   3.5
6    C   0.0.3   2.6
7    C   0.0.2   2.5
8    C   0.0.1   2.4
0    A   0.0.3   1.7
1    A   0.0.2   1.4
2    A   0.0.1   1.0

Answer 4

Try:尝试：

df.assign(groupsum=df.groupby(level=0)['Cost'].transform('sum'))\
  .sort_values(['groupsum', 'Version'], ascending=False)

Output:输出：

       Name Version  Cost  groupsum
Name                               
B    8    B   0.0.3   3.8      11.0
     4    B   0.0.2   3.7      11.0
     5    B   0.0.1   3.5      11.0
C    7    C   0.0.3   2.6       7.5
     1    C   0.0.2   2.5       7.5
     3    C   0.0.1   2.4       7.5
A    0    A   0.0.3   1.7       4.1
     6    A   0.0.2   1.4       4.1
     2    A   0.0.1   1.0       4.1

And, you can add reset_index(drop=True) at the end:并且，您可以在reset_index(drop=True)添加reset_index(drop=True) ：

  Name Version  Cost  groupsum
0    B   0.0.3   3.8      11.0
1    B   0.0.2   3.7      11.0
2    B   0.0.1   3.5      11.0
3    C   0.0.3   2.6       7.5
4    C   0.0.2   2.5       7.5
5    C   0.0.1   2.4       7.5
6    A   0.0.3   1.7       4.1
7    A   0.0.2   1.4       4.1
8    A   0.0.1   1.0       4.1

Or, using your "original" dataframe above:或者，使用上面的“原始”数据框：

df.assign(groupsum=df.groupby('Name')['Cost'].transform('sum'))\
  .sort_values(['groupsum', 'Version'], ascending=[False,False])

Output:输出：

  Name Version  Cost  groupsum
8    B   0.0.3   3.8      11.0
4    B   0.0.2   3.7      11.0
5    B   0.0.1   3.5      11.0
7    C   0.0.3   2.6       7.5
1    C   0.0.2   2.5       7.5
3    C   0.0.1   2.4       7.5
0    A   0.0.3   1.7       4.1
6    A   0.0.2   1.4       4.1
2    A   0.0.1   1.0       4.1

在保留原始行的同时对数据框组进行排序

问题描述

4 个解决方案

解决方案1
1 2021-07-13 18:11:56

解决方案2
1 2021-07-13 18:14:14

解决方案3
1 2021-07-13 18:29:21

解决方案4
1 已采纳 2021-07-13 19:54:18

在保留原始行的同时对数据框组进行排序

问题描述

4 个解决方案

解决方案1 1 2021-07-13 18:11:56

解决方案2 1 2021-07-13 18:14:14

解决方案3 1 2021-07-13 18:29:21

解决方案4 1 已采纳 2021-07-13 19:54:18

解决方案1
1 2021-07-13 18:11:56

解决方案2
1 2021-07-13 18:14:14

解决方案3
1 2021-07-13 18:29:21

解决方案4
1 已采纳 2021-07-13 19:54:18