[英]Sorting pivot table (multi index)
I'm trying to sort a pivot table's values in descending order after putting two "row labels" (Excel term) on the pivot. 我正在尝试在数据透视表上放置两个“行标签”(Excel术语)后按降序对数据透视表的值进行排序。
sample data: 样本数据:
x = pd.DataFrame({'col1':['a','a','b','c','c', 'a','b','c', 'a','b','c'],
'col2':[ 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3],
'col3':[ 1,.67,0.5, 2,.65, .75,2.25,2.5, .5, 2,2.75]})
print(x)
col1 col2 col3
0 a 1 1.00
1 a 1 0.67
2 b 1 0.50
3 c 1 2.00
4 c 1 0.65
5 a 2 0.75
6 b 2 2.25
7 c 2 2.50
8 a 3 0.50
9 b 3 2.00
10 c 3 2.75
To create the pivot, I'm using the following function: 要创建数据透视表,我使用以下函数:
pt = pd.pivot_table(x, index = ['col1', 'col2'], values = 'col3', aggfunc = np.sum)
print(pt)
col3
col1 col2
a 1 1.67
2 0.75
3 0.50
b 1 0.50
2 2.25
3 2.00
c 1 2.65
2 2.50
3 2.75
In words, this variable pt
is first sorted by col1
, then by values of col2
within col1
then by col3
within all of those. 在的话,这个变量pt
首先被排序col1
,然后通过值col2
内col1
然后通过col3
之内的所有的那些。 This is great, but I would like to sort by col3
(the values) while keeping the groups that were broken out in col2
(this column can be any order and shuffled around). 这很好,但我想按col3
(值)排序,同时保持在col2
分组的组(此列可以是任何顺序并且随机排列)。
The target output would look something like this ( col3
in descending order with any order in col2
with that group of col1
): 目标输出会是这个样子( col3
中以任何顺序降序排列col2
与该组的col1
):
col3
col1 col2
a 1 1.67
2 0.75
3 0.50
b 2 2.25
3 2.00
1 0.50
c 3 2.75
1 2.65
2 2.50
I have tried the code below, but this just sorts the entire pivot table values and loses the grouping (I'm looking for sorting within the group). 我已经尝试了下面的代码,但这只是对整个数据透视表值进行排序并丢失分组(我正在寻找组内的排序)。
pt.sort_values(by = 'col3', ascending = False)
For guidance, a similar question was asked (and answered) here, but I was unable to get a successful output with the provided output: 为了获得指导,这里提出了(并回答了)类似的问题,但是我无法使用提供的输出获得成功的输出:
Pandas: Sort pivot table 熊猫:排序枢轴表
The error I get from that answer is ValueError: all keys need to be the same shape
我从该答案中得到的错误是ValueError: all keys need to be the same shape
You need reset_index
for DataFrame
, then sort_values
by col1
and col3
and last set_index
for MultiIndex
: 你需要reset_index
的DataFrame
,然后sort_values
通过col1
和col3
和最后set_index
为MultiIndex
:
df = df.reset_index()
.sort_values(['col1','col3'], ascending=[True, False])
.set_index(['col1','col2'])
print (df)
col3
col1 col2
a 1 1.67
2 0.75
3 0.50
b 2 2.25
3 2.00
1 0.50
c 3 2.75
1 2.65
2 2.50
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.