简体   繁体   English

在保留原始行的同时对数据框组进行排序

[英]Sort the Dataframe groups while retaining original rows

Here is the original dataframe:这是原始数据框:

Name Version Cost
A    0.0.3   1.7
C    0.0.2   2.5
A    0.0.1   1.0
C    0.0.1   2.4
B    0.0.2   3.7
B    0.0.1   3.5
A    0.0.2   1.4
C    0.0.3   2.6
B    0.0.3   3.8

After grouping and sorting within groups using the following code:使用以下代码在组内进行分组和排序后:

df = df.sort_values(['Name', 'Version'], ascending=[True, False])
df = df.groupby(['Name'], sort=False)
df = df.apply(lambda x: x.sort_values(['Cost'], ascending=False))

Now I have this dataframe where the Cost is sorted within group and groups ordered alphabetically.现在我有了这个数据框,其中成本在组内排序,并按字母顺序排列。

     Name Version Cost
Name
A    A    0.0.3   1.7
     A    0.0.2   1.4
     A    0.0.1   1.0
B    B    0.0.3   3.8
     B    0.0.2   3.7
     B    0.0.1   3.5
C    C    0.0.3   2.6
     C    0.0.2   2.5
     C    0.0.1   2.4

Question is, now I would like to sort the groups by total cost of each group so the expected the result looks like this:问题是,现在我想按每个组的总成本对组进行排序,因此预期的结果如下所示:

Name Version Cost
B    0.0.3   3.8
B    0.0.2   3.7
B    0.0.1   3.5
C    0.0.3   2.6
C    0.0.2   2.5
C    0.0.1   2.4
A    0.0.3   1.7
A    0.0.2   1.4
A    0.0.1   1.0

How can I achieve it with without losing the rows.我怎样才能在不丢失行的情况下实现它。

You can create temporary column and sort by it.您可以创建临时列并按它排序。 Then drop that column:然后删除该列:

df["tmp"] = df.groupby("Name")["Cost"].transform("sum")
df = df.sort_values(by="tmp", ascending=False).drop("tmp", 1)
print(df)

Prints:印刷:

  Name Version  Cost
3    B   0.0.3   3.8
4    B   0.0.2   3.7
5    B   0.0.1   3.5
6    C   0.0.3   2.6
7    C   0.0.2   2.5
8    C   0.0.1   2.4
0    A   0.0.3   1.7
1    A   0.0.2   1.4
2    A   0.0.1   1.0

df used: df使用:

Name Version Cost
A    0.0.3   1.7
A    0.0.2   1.4
A    0.0.1   1.0
B    0.0.3   3.8
B    0.0.2   3.7
B    0.0.1   3.5
C    0.0.3   2.6
C    0.0.2   2.5
C    0.0.1   2.4

Starting from your original dataframe, you can generate a helper column of group sums with transform and sort according to that and also Version column both in descending order:从您的原始数据帧开始,您可以生成一个组总和的辅助列,并根据该列进行transform和排序,以及按降序排列的Version列:

group_sums = df.groupby("Name").Cost.transform("sum")
out = (df.assign(sorter=group_sums)
         .sort_values(["sorter", "Version"], ascending=False, ignore_index=True)
         .drop(columns="sorter"))

where we drop the helper column sorter after sorting,我们在排序后删除辅助列sorter

to get要得到

>>> out

  Name Version  Cost
0    B   0.0.3   3.8
1    B   0.0.2   3.7
2    B   0.0.1   3.5
3    C   0.0.3   2.6
4    C   0.0.2   2.5
5    C   0.0.1   2.4
6    A   0.0.3   1.7
7    A   0.0.2   1.4
8    A   0.0.1   1.0

You can use the key argument in sort_values to achieve the same result as the other answers:您可以使用sort_valueskey参数来获得与其他答案相同的结果:

print (df.sort_values("Cost", ascending=False,
                      key=lambda _: df.groupby("Name")["Cost"].transform("sum")))

  Name Version  Cost
3    B   0.0.3   3.8
4    B   0.0.2   3.7
5    B   0.0.1   3.5
6    C   0.0.3   2.6
7    C   0.0.2   2.5
8    C   0.0.1   2.4
0    A   0.0.3   1.7
1    A   0.0.2   1.4
2    A   0.0.1   1.0

Try:尝试:

df.assign(groupsum=df.groupby(level=0)['Cost'].transform('sum'))\
  .sort_values(['groupsum', 'Version'], ascending=False)

Output:输出:

       Name Version  Cost  groupsum
Name                               
B    8    B   0.0.3   3.8      11.0
     4    B   0.0.2   3.7      11.0
     5    B   0.0.1   3.5      11.0
C    7    C   0.0.3   2.6       7.5
     1    C   0.0.2   2.5       7.5
     3    C   0.0.1   2.4       7.5
A    0    A   0.0.3   1.7       4.1
     6    A   0.0.2   1.4       4.1
     2    A   0.0.1   1.0       4.1

And, you can add reset_index(drop=True) at the end:并且,您可以在reset_index(drop=True)添加reset_index(drop=True)

  Name Version  Cost  groupsum
0    B   0.0.3   3.8      11.0
1    B   0.0.2   3.7      11.0
2    B   0.0.1   3.5      11.0
3    C   0.0.3   2.6       7.5
4    C   0.0.2   2.5       7.5
5    C   0.0.1   2.4       7.5
6    A   0.0.3   1.7       4.1
7    A   0.0.2   1.4       4.1
8    A   0.0.1   1.0       4.1

Or, using your "original" dataframe above:或者,使用上面的“原始”数据框:

df.assign(groupsum=df.groupby('Name')['Cost'].transform('sum'))\
  .sort_values(['groupsum', 'Version'], ascending=[False,False])

Output:输出:

  Name Version  Cost  groupsum
8    B   0.0.3   3.8      11.0
4    B   0.0.2   3.7      11.0
5    B   0.0.1   3.5      11.0
7    C   0.0.3   2.6       7.5
1    C   0.0.2   2.5       7.5
3    C   0.0.1   2.4       7.5
0    A   0.0.3   1.7       4.1
6    A   0.0.2   1.4       4.1
2    A   0.0.1   1.0       4.1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 聚合 Function 到 dataframe,同时保留 Pandas 中的行 - Aggregate Function to dataframe while retaining rows in Pandas 改组 Pandas DataFrame 中的行,同时保留索引 - Shuffling rows in a Pandas DataFrame while retaining the index 如何获得过滤后的数据框进行计算,同时将原始数据框保留在熊猫中? - How to get the filtered dataframe for calculations while retaining the original one in pandas? 如何在保留现有架构的情况下按行创建DataFrame? - How to create a DataFrame out of rows while retaining existing schema? 在保留原件的同时传递列表 - Passing a list while retaining the original Python pandas 对组间排序,而不是在组内排序(重新排列分组行,但在 groupby 之前保持原始行顺序 - Python pandas sort inter groups, not intra groups (rearrange grouped rows but maintain original row order before groupby 如何舍入值只显示在pandas中,同时保留数据框中的原始值? - How to round values only for display in pandas while retaining original ones in the dataframe? 将 DataFrame 分组和 plot 它 - Sort DataFrame into groups and plot it 在保留原始 colors 的同时混合多张图像 - Blending multiple images while retaining the original colors 在保留原始列标题的同时旋转列 - Pivot columns while retaining original column headers
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM