[英]How to merge two columns of a pivot_table from pandas in python?
[英]Pandas Merge columns of a pivot table
我生成了一个熊猫数据透视表,如下所示:
Encounters
Code 132 133 145
Record Number Start_date End_date Service_Date
2322 1/1/2017 1/3/2017 1/1/2017 0 1 1
1/2/2017 1 0 0
1/3/2017 0 1 1
我想根据代码合并和汇总一些数据透视表列
所需的输出:
Encounters
Code 132 133-145
Record Number Start_date End_date Service_Date
2322 1/1/2017 1/3/2017 1/1/2017 0 2
1/2/2017 1 0
1/3/2017 0 2
数据透视表创建层次结构列(即,多个级别)。 因此,考虑使用元组分配为不同级别分配新的sum列:
df[('Encounters', '133-145')] = df[('Encounters', '133')] + df[('Encounters', '145')]
del df[('Encounters', '133')]
del df[('Encounters', '145')]
df.sortlevel(0, axis=1, inplace=True)
为了演示随机数据:
数据 (带透视的种子数据)
import numpy as np
import pandas as pd
import datetime as dt
import time
LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
epoch_time = int(time.time())
np.random.seed(555)
df = pd.DataFrame({'ID': [np.random.randint(15) for _ in range(50)],
'GROUP': ["".join(np.random.choice(LETTERS[0:3],1)) for _ in range(50)],
'NUM': np.random.uniform(50)/100,
'DATE': [dt.datetime.fromtimestamp(np.random.randint(low=1400270738,
high=epoch_time)) for _ in range(50)]})
df['YEAR'] = df['DATE'].dt.year
pvtdf = df.pivot_table(index = ['ID'], columns = ['YEAR', 'GROUP'], values = ['NUM']).fillna(0)
print(pvtdf)
# NUM
# YEAR 2014 2015 2016 2017
# GROUP A B C A B C A B C A B C
# ID
# 0 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258
# 1 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258
# 3 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.411258 0.000000 0.411258 0.000000
# 4 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000
# 5 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 6 0.000000 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 7 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 8 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 9 0.000000 0.000000 0.411258 0.411258 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000
# 10 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000
# 11 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000
# 12 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 13 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000 0.411258 0.000000 0.411258 0.000000
# 14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000
流程 (将所有2017 A,B,C列添加到D中,然后删除)
pvtdf[('NUM', 2017, 'D')] = pvtdf[('NUM', 2017, 'A')] + pvtdf[('NUM', 2017, 'B')] + pvtdf[('NUM', 2017, 'C')]
pvtdf = pvtdf.drop([('NUM', 2017, 'A'), ('NUM', 2017, 'B'), ('NUM', 2017, 'C')], axis=1)
pvtdf.sortlevel(0, axis=1, inplace=True)
print(pvtdf)
# NUM
# YEAR 2014 2015 2016 2017
# GROUP A B C A B C A B C D
# ID
# 0 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.822515
# 1 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.822515
# 3 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.411258 0.411258
# 4 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258
# 5 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 6 0.000000 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000 0.000000
# 7 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 8 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 9 0.000000 0.000000 0.411258 0.411258 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000
# 10 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000
# 11 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000
# 12 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 13 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000 0.411258 0.411258
# 14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.