[英]Pandas pivot Table Multi-Layer Sorting
I have given df: (UPDATED):我给了df:(更新):
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar","zz","zz"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two","xy","zz"],
"Name":["Peter", "Amy", "Brian", "Amy", "Amy",
"Peter", "Brian", "Peter", "Brian","Brian","Brian"],
"Year": [2019, 2019, 2019, 2019,
2019, 2019, 2020, 2020,
2020,2019,2020],
"Values": [20, 4, 20, 5, 6, 6, 8, 9, 9,10,5]})
df_pivot = pd.pivot_table(df, values='Values', index=['Name','A', 'B'],
columns=['Year'], aggfunc=np.sum, fill_value=0, margins=True, margins_name="Totals")
Once I pivot it in a way I like it looks like this:一旦我以我喜欢的方式 pivot 它看起来像这样:
Year 2019 2020 Totals
Name A B
Amy foo one 4 0 4
two 11 0 11
Brian bar one 0 8 8
two 0 9 9
foo one 20 0 20
zz xy 10 0 10
zz 0 5 5
Peter bar one 6 0 6
two 0 9 9
foo one 20 0 20
Totals 71 31 102
Now the "fun" part begins..现在“有趣”的部分开始了。
I would like this df pivot table to be sorted on all index columns from left to right based on sum of values.我希望这个 df pivot 表根据值的总和从左到右对所有索引列进行排序。
Let me explain.让我解释。
Firstly I would like to sort this pivot table by column "Name" in descending order of "Totals" for each name, therefore I would calculate sum for Amy = 15, Brian = 52, Peter= 35. From this I know that first column should be sorted Brian/Peter/Amy.首先,我想按每个名称的“总计”的降序按“名称”列对 pivot 表进行排序,因此我将计算 Amy = 15、Brian = 52、Peter = 35 的总和。由此我知道第一列应该排序 Brian/Peter/Amy。
Now I do the same for second column "A", but first column "Name" is fixed.现在我对第二列“A”做同样的事情,但第一列“名称”是固定的。
ie for name Brian (which is on top) I now calculate totals for column "A" (I want to see whether foo/bar/zz should be first), therefore I calculate that Brian-Foo is equal to 20 and Brian-bar is equal to 8+9 and Brian-zz is 15, therefore we want to have Foo first for Brian in second column... and the same for rest indexed columns.即名称 Brian(在顶部)我现在计算列“A”的总数(我想看看 foo/bar/zz 是否应该是第一个),因此我计算出 Brian-Foo 等于 20 和 Brian-bar等于 8+9 并且 Brian-zz 是 15,因此我们希望在第二列中首先为 Brian 设置 Foo ......对于 rest 索引列也是如此。
The output should look like this: output 应如下所示:
Year 2019 2020 Totals
Name A B
Brian foo one 20 0 20
bar two 0 9 9
one 0 8 8
zz xy 10 0 10
zz 0 5 5
Peter foo one 20 0 20
bar two 0 9 9
one 6 0 6
Amy foo two 11 0 11
one 4 0 4
Totals 71 31 102
So long story short, firstly I want to sort first column based on totals for items from that column and I want to fix it, then I want to sort second column for items from that column, but grouped as per first sorting etc.长话短说,首先我想根据该列中项目的总数对第一列进行排序并且我想修复它,然后我想对该列中的项目进行排序,但按照第一次排序等进行分组。
Can you advise how to do this please?你能建议如何做到这一点吗? I appreciate help a lot!
我非常感谢帮助!
Thanks Pawel谢谢帕维尔
You can use groupby.transform
to get the sum within names, then sort with it:您可以使用
groupby.transform
获取名称中的总和,然后对其进行排序:
df_pivot = (df_pivot.iloc[:-1]
.assign(sort=lambda x: x['Totals'].groupby(level=0).transform('sum'))
.sort_values(['sort','Name','Totals'],
ascending=[False,True,False], kind='mergesort')
.drop('sort', axis=1)
.append(df_pivot.iloc[-1])
)
Output: Output:
Year 2019 2020 Totals
Name A B
Brian foo one 20 0 20
bar two 0 9 9
one 0 8 8
Peter foo one 20 0 20
bar two 0 9 9
one 6 0 6
Amy foo two 11 0 11
one 4 0 4
Totals 61 26 87
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.