[英]How to create new columns with groupBy and transform from rows
我有这个数据 dataframe
import pandas as pd
import numpy as np
from pandas import DataFrame
df3 = pd.DataFrame({
'MONTHYEAR' : ['2021/01', '2021/02', '2021/03', '2021/01', '2021/02', '2021/03', '2022/01'],
'CATEGORY' : ['INCOME', 'INCOME', 'INCOME', 'INCOME', 'INCOME', 'INCOME', 'INCOME'],
'SUBCATEGORY': ['INCOME HD', 'INCOME HD', 'INCOME HD', 'INCOME AD','INCOME AD','INCOME AD', 'INCOME AD'],
'AMOUNT': [1000, 2000, 3000, 4000, 5000, 6000, 7000]
})
我想添加 3 个新列 HD、AD 和 SUM
df3['HD'] = 0
df3['AD'] = 0
df3['TOTAL'] = 0
df3['TOTAL'] = df3['AMOUNT'].groupby(df3['MONTHYEAR']).transform('sum')
df3.loc[df3['SUBCATEGORY'] == "INCOME HD", 'HD'] = df3['AMOUNT']
df3.loc[df3['SUBCATEGORY'] == "INCOME AD", 'AD'] = df3['AMOUNT']
df3
到目前为止,我得到了这个:
但我想要的是这个
任何帮助都非常感谢!
首先使用DataFrame.pivot_table
, rename
列并通过sum
创建新列,最后将MultiIndex
转换为列:
df1 = (df3.pivot_table(index=['MONTHYEAR','CATEGORY'],
columns='SUBCATEGORY',
values='AMOUNT',
aggfunc='sum',
fill_value=0)
.rename(columns={'INCOME AD':'AD','INCOME HD':'HD'})
[['HD','AD']]
.assign(TOTAL = lambda x: x.sum(axis=1))
.reset_index()
.rename_axis(None, axis=1)
)
print (df1)
MONTHYEAR CATEGORY HD AD TOTAL
0 2021/01 INCOME 1000 4000 5000
1 2021/02 INCOME 2000 5000 7000
2 2021/03 INCOME 3000 6000 9000
3 2022/01 INCOME 0 7000 7000
您可以使用.agg()
function 来执行此操作。 这是代码:
df3 = df3.groupby(['MONTHYEAR']).agg({'CATEGORY':'first', 'HD':'sum', 'AD':'sum', 'TOTAL':'first'}).reset_index()
output 将如下所示:
MONTHYEAR CATEGORY HD AD TOTAL
0 2021/01 INCOME 1000 4000 5000
1 2021/02 INCOME 2000 5000 7000
2 2021/03 INCOME 3000 6000 9000
3 2022/01 INCOME 0 7000 7000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.