简体   繁体   English

熊猫数据透视表:列顺序和小计

[英]Pandas pivot table: columns order and subtotals

I'm using Pandas 0.19. 我正在使用Pandas 0.19。

Considering the following data frame: 考虑以下数据帧:

FID  admin0  admin1  admin2  windspeed  population
0    cntry1  state1  city1   60km/h     700
1    cntry1  state1  city1   90km/h     210
2    cntry1  state1  city2   60km/h     100
3    cntry1  state2  city3   60km/h     70
4    cntry1  state2  city4   60km/h     180
5    cntry1  state2  city4   90km/h     370
6    cntry2  state3  city5   60km/h     890
7    cntry2  state3  city6   60km/h     120
8    cntry2  state3  city6   90km/h     420
9    cntry2  state3  city6   120km/h    360
10   cntry2  state4  city7   60km/h     740

How can I create a table like this one? 如何创建这样的表?

                                population
                         60km/h  90km/h  120km/h
admin0  admin1  admin2  
cntry1  state1  city1    700     210      0
cntry1  state1  city2    100     0        0
cntry1  state2  city3    70      0        0
cntry1  state2  city4    180     370      0
cntry2  state3  city5    890     0        0
cntry2  state3  city6    120     420      360
cntry2  state4  city7    740     0        0

I have tried with the following pivot table: 我尝试使用以下数据透视表:

table = pd.pivot_table(df,index=["admin0","admin1","admin2"], columns=["windspeed"], values=["population"],fill_value=0)

In general it works great, but unfortunately I am not able to sort the new columns in the right order: the 120km/h column appears before the ones for 60km/h and 90km/h. 通常,它的效果很好,但是很遗憾,我无法以正确的顺序对新列进行排序:120km / h列出现在60km / h和90km / h的列之前。 How can I specify the order of the new columns? 如何指定新列的顺序?

Moreover, as a second step I need to add subtotals both for admin0 and admin1. 此外,作为第二步,我需要为admin0和admin1添加小计。 Ideally, the table I need should be like this: 理想情况下,我需要的表应如下所示:

                                population
                         60km/h  90km/h  120km/h
admin0  admin1  admin2  
cntry1  state1  city1    700     210      0
cntry1  state1  city2    100     0        0
        SUM state1       800     210      0
cntry1  state2  city3    70      0        0
cntry1  state2  city4    180     370      0
        SUM state2       250     370      0
SUM cntry1               1050    580      0
cntry2  state3  city5    890     0        0
cntry2  state3  city6    120     420      360
        SUM state3       1010    420      360
cntry2  state4  city7    740     0        0
        SUM state4       740     0        0
SUM cntry2               1750    420      360
SUM ALL                  2800    1000    360

you can do it using reindex() method and custom sorting: 您可以使用reindex()方法和自定义排序来做到这一点:

In [26]: table
Out[26]:
                     population
windspeed               120km/h 60km/h 90km/h
admin0 admin1 admin2
cntry1 state1 city1           0    700    210
              city2           0    100      0
       state2 city3           0     70      0
              city4           0    180    370
cntry2 state3 city5           0    890      0
              city6         360    120    420
       state4 city7           0    740      0

In [27]: cols = sorted(table.columns.tolist(), key=lambda x: int(x[1].replace('km/h','')))

In [28]: cols
Out[28]: [('population', '60km/h'), ('population', '90km/h'), ('population', '120km/h')]

In [29]: table = table.reindex(columns=cols)

In [30]: table
Out[30]:
                     population
windspeed                60km/h 90km/h 120km/h
admin0 admin1 admin2
cntry1 state1 city1         700    210       0
              city2         100      0       0
       state2 city3          70      0       0
              city4         180    370       0
cntry2 state3 city5         890      0       0
              city6         120    420     360
       state4 city7         740      0       0

Solution with subtotals and MultiIndex.from_arrays . 小计和MultiIndex.from_arrays解决方案。 Last concat and all Dataframes , sort_index and add all sum : 最后一个concat和所有Dataframessort_index并加sum

#replace km/h and convert to int
df.windspeed = df.windspeed.str.replace('km/h','').astype(int)
print (df)
    FID  admin0  admin1 admin2  windspeed  population
0     0  cntry1  state1  city1         60         700
1     1  cntry1  state1  city1         90         210
2     2  cntry1  state1  city2         60         100
3     3  cntry1  state2  city3         60          70
4     4  cntry1  state2  city4         60         180
5     5  cntry1  state2  city4         90         370
6     6  cntry2  state3  city5         60         890
7     7  cntry2  state3  city6         60         120
8     8  cntry2  state3  city6         90         420
9     9  cntry2  state3  city6        120         360
10   10  cntry2  state4  city7         60         740

#pivoting
table = pd.pivot_table(df,
                       index=["admin0","admin1","admin2"], 
                       columns=["windspeed"], 
                       values=["population"],
                       fill_value=0)
print (table)
                    population          
windspeed                   60   90   120
admin0 admin1 admin2                     
cntry1 state1 city1         700  210    0
              city2         100    0    0
       state2 city3          70    0    0
              city4         180  370    0
cntry2 state3 city5         890    0    0
              city6         120  420  360
       state4 city7         740    0    0
#groupby and create sum dataframe by levels 0,1
df1 = table.groupby(level=[0,1]).sum()
df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), 
                                       df1.index.get_level_values(1)+ '_sum', 
                                       len(df1.index) * ['']])
print (df1)
                   population          
windspeed                 60   90   120
admin0                                 
cntry1 state1_sum         800  210    0
       state2_sum         250  370    0
cntry2 state3_sum        1010  420  360
       state4_sum         740    0    0

df2 = table.groupby(level=0).sum()
df2.index = pd.MultiIndex.from_arrays([df2.index.values + '_sum',
                                       len(df2.index) * [''], 
                                       len(df2.index) * ['']])
print (df2)
             population          
windspeed           60   90   120
cntry1_sum         1050  580    0
cntry2_sum         1750  420  360

#concat all dataframes together, sort index
df = pd.concat([table, df1, df2]).sort_index(level=[0])
#add km/h to second level in columns
df.columns = pd.MultiIndex.from_arrays([df.columns.get_level_values(0),
                                       df.columns.get_level_values(1).astype(str) + 'km/h'])

#add all sum
df.loc[('All_sum','','')] = table.sum().values
print (df)
                             population               
                                 60km/h 90km/h 120km/h
admin0     admin1     admin2                          
cntry1     state1     city1         700    210       0
                      city2         100      0       0
           state1_sum               800    210       0
           state2     city3          70      0       0
                      city4         180    370       0
           state2_sum               250    370       0
cntry1_sum                         1050    580       0
cntry2     state3     city5         890      0       0
                      city6         120    420     360
           state3_sum              1010    420     360
           state4     city7         740      0       0
           state4_sum               740      0       0
cntry2_sum                         1750    420     360
All_sum                            2800   1000     360

EDIT by comment: 通过评论编辑:

def f(x):
    print (x)
    if (len(x) > 1):
        return x.sum()

df1 = table.groupby(level=[0,1]).apply(f).dropna(how='all')
df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), 
                                       df1.index.get_level_values(1)+ '_sum', 
                                       len(df1.index) * ['']])
print (df1)
                   population              
windspeed                 60     90     120
admin0                                     
cntry1 state1_sum       800.0  210.0    0.0
       state2_sum       250.0  370.0    0.0
cntry2 state3_sum      1010.0  420.0  360.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM