[英]Sort pandas pivot table by sum of rows and columns
I have (for example) this DataFrame:我有(例如)这个 DataFrame:
COLUMN1 COLUMN2 VALUE
0 0102 1020 1
1 0102 1220 8
2 0102 1210 2
3 0103 1020 1
4 0103 1210 3
5 0103 1222 8
6 0104 1020 3
7 0104 1120 2
(In reailty, it's ~9000 rows long.) (实际上,它大约有 9000 行长。)
From this, I create the pivot table where indexes are COLUMN1, columns are COLUMN2, and the values are from VALUES, filled by 0 where NaN.由此,我创建了数据透视表,其中索引为 COLUMN1,列为 COLUMN2,值来自 VALUES,由 0 填充,其中 NaN。
COLUMN2 1020 1120 1210 1220 1222
COLUMN1
0102 1 0 2 8 0
0103 1 0 3 0 8
0104 3 2 0 0 0
I have to sort this pivot by the grand total of rows, then by the grand total of columns.我必须先按总行数排序这个数据透视表,然后再按总列数排序。 That would look like this:那看起来像这样:
COLUMN2 1220 1222 1020 1210 1120| (GT)
COLUMN1 | HIGHEST
0103 0 8 1 3 0| (12) |
0102 8 0 1 2 0| (11) |
0104 0 0 3 0 2| (5) V
--------------------------------------
(GT: 8 8 5 5 2)
HIGHTEST----------------------------> LOWEST
Is there a way to do this?有没有办法做到这一点? I have tried creating the pivot by importing the indexes and columns as lists, sorted in the order I would like them to appear, but pandas seems to automatically sort them AZ when creating the table.我尝试通过将索引和列作为列表导入来创建数据透视表,按照我希望它们出现的顺序排序,但是 Pandas 在创建表时似乎会自动对它们进行 AZ 排序。
Code for the example:示例代码:
import pandas as pd
exampledata=[['0102','1020',1],['0102','1220',8],['0102','1210',2],
['0103','1020',1],['0103','1210',3], ['0103','1222',8],
['0104','1020',3],['0104','1120',2]]
df = pd.DataFrame(exampledata,columns=['COLUMN1','COLUMN2','VALUE'])
print(df)
pivot = pd.pivot_table(df,
index='COLUMN1',
columns='COLUMN2',
values='VALUE',
aggfunc='sum',
fill_value=0)
print(pivot)
pivot_table
has an option margin
which is convenient for this case: pivot_table
有一个选项margin
,在这种情况下很方便:
(df.pivot_table(index='COLUMN1', columns='COLUMN2', values='VALUE',
aggfunc='sum', fill_value=0, margins=True) # pivot with margins
.sort_values('All', ascending=False) # sort by row sum
.drop('All', axis=1) # drop column `All`
.sort_values('All', ascending=False, axis=1) # sort by column sum
.drop('All') # drop row `All`
)
Output:输出:
COLUMN2 1220 1222 1020 1210 1120
COLUMN1
103 0 8 1 3 0
102 8 0 1 2 0
104 0 0 3 0 2
I will try something like this我会尝试这样的事情
pivot['sum_cols'] = pivot.sum(axis=1)
pivot = pivot.sort_values('sum_cols' , ascending=False)
The index of your pivot table (values from COLUMN1
and COLUMN2
) are of type String
, and sorting of String
is done from A to Z. Perhaps you should input indexes of Integer
type, and then the sorting will be done numerically.您的数据透视表的索引(从值COLUMN1
和COLUMN2
)的类型为String
,并且排序String
从A做Z.也许你应该输入索引Integer
类型,则排序将数字进行。 Considering the pivot_table documentation Integer type is allowed for columns
and index
.考虑到pivot_table 文档, columns
和index
允许使用整数类型。
df = df.astype('int')
Now, your pivot_table
function outputs a DataFrame
, which you can sort by index or by columns in the same manner you do with any DataFrame
.现在,您的pivot_table
函数输出一个DataFrame
,您可以按照与任何DataFrame
相同的方式按索引或按列对其进行DataFrame
。
According to sort_index documentation : For sorting the index you should do:根据sort_index 文档:要对索引进行排序,您应该执行以下操作:
pivot = pivot.sort_index(ascending=0)
For sorting the columns you should do:要对列进行排序,您应该执行以下操作:
pivot = pivot.sort_index(axis=1, ascending=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.