简体   繁体   English

数据透视表的熊猫合并列

[英]Pandas Merge columns of a pivot table

I generated a pandas pivot table that looks like this: 我生成了一个熊猫数据透视表,如下所示:

                                                                 Encounters
                                                     Code   132   133  145
  Record Number  Start_date  End_date  Service_Date
    2322            1/1/2017  1/3/2017  1/1/2017             0     1    1
                                        1/2/2017             1     0    0
                                        1/3/2017             0     1    1

I would like to merge and sum some of the pivot table columns based on the Code 我想根据代码合并和汇总一些数据透视表列

Desired output: 所需的输出:

                                                              Encounters
                                                     Code   132   133-145 
  Record Number  Start_date  End_date  Service_Date
    2322            1/1/2017  1/3/2017  1/1/2017             0      2    
                                        1/2/2017             1      0    
                                        1/3/2017             0      2    

Pivot tables create hierarchical columns (ie, multiple levels). 数据透视表创建层次结构列(即,多个级别)。 Hence, consider assigning a new, sum column using the tuple assignment for the different levels: 因此,考虑使用元组分配为不同级别分配新的sum列:

df[('Encounters', '133-145')] = df[('Encounters', '133')] + df[('Encounters', '145')] 

del df[('Encounters', '133')] 
del df[('Encounters', '145')] 

df.sortlevel(0, axis=1, inplace=True)

To demonstrate with random data: 为了演示随机数据:

Data (seeded data with pivot) 数据 (带透视的种子数据)

import numpy as np
import pandas as pd
import datetime as dt
import time

LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')    
epoch_time = int(time.time())

np.random.seed(555)
df = pd.DataFrame({'ID': [np.random.randint(15) for _ in range(50)],
                   'GROUP': ["".join(np.random.choice(LETTERS[0:3],1)) for _ in range(50)],
                   'NUM': np.random.uniform(50)/100,
                   'DATE': [dt.datetime.fromtimestamp(np.random.randint(low=1400270738,
                            high=epoch_time)) for _ in range(50)]})

df['YEAR'] = df['DATE'].dt.year
pvtdf = df.pivot_table(index = ['ID'], columns = ['YEAR', 'GROUP'], values = ['NUM']).fillna(0)

print(pvtdf)
#             NUM                                                                                                              
# YEAR       2014                          2015                          2016                          2017                    
# GROUP         A         B         C         A         B         C         A         B         C         A         B         C
# ID                                                                                                                           
# 0      0.000000  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.411258  0.411258
# 1      0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.411258  0.411258  0.000000  0.000000  0.411258  0.411258
# 3      0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.411258  0.411258  0.411258  0.000000  0.411258  0.000000
# 4      0.411258  0.411258  0.000000  0.000000  0.411258  0.411258  0.000000  0.000000  0.411258  0.411258  0.000000  0.000000
# 5      0.411258  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.411258
# 6      0.000000  0.411258  0.000000  0.000000  0.411258  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
# 7      0.000000  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
# 8      0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.411258
# 9      0.000000  0.000000  0.411258  0.411258  0.000000  0.411258  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000
# 10     0.000000  0.000000  0.000000  0.411258  0.411258  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000  0.000000
# 11     0.000000  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000
# 12     0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
# 13     0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.411258  0.000000  0.411258  0.000000  0.411258  0.000000
# 14     0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.411258  0.000000

Process (all 2017 A, B, C columns added into D and then removed) 流程 (将所有2017 A,B,C列添加到D中,然后删除)

pvtdf[('NUM', 2017, 'D')] = pvtdf[('NUM', 2017, 'A')] + pvtdf[('NUM', 2017, 'B')] + pvtdf[('NUM', 2017, 'C')]

pvtdf = pvtdf.drop([('NUM', 2017, 'A'), ('NUM', 2017, 'B'), ('NUM', 2017, 'C')], axis=1)    
pvtdf.sortlevel(0, axis=1, inplace=True)

print(pvtdf)    
#             NUM                                                                                          
# YEAR       2014                          2015                          2016                          2017
# GROUP         A         B         C         A         B         C         A         B         C         D
# ID                                                                                                       
# 0      0.000000  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.822515
# 1      0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.411258  0.411258  0.000000  0.822515
# 3      0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.411258  0.411258  0.411258  0.411258
# 4      0.411258  0.411258  0.000000  0.000000  0.411258  0.411258  0.000000  0.000000  0.411258  0.411258
# 5      0.411258  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.411258
# 6      0.000000  0.411258  0.000000  0.000000  0.411258  0.411258  0.000000  0.000000  0.000000  0.000000
# 7      0.000000  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
# 8      0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.411258
# 9      0.000000  0.000000  0.411258  0.411258  0.000000  0.411258  0.411258  0.000000  0.000000  0.000000
# 10     0.000000  0.000000  0.000000  0.411258  0.411258  0.000000  0.000000  0.411258  0.000000  0.000000
# 11     0.000000  0.000000  0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.411258  0.000000
# 12     0.411258  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
# 13     0.000000  0.411258  0.000000  0.000000  0.000000  0.000000  0.411258  0.000000  0.411258  0.411258
# 14     0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.411258

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM