[英]Pandas multiple index with multiple aggregate functions
使用此样本数据帧数据:
+------+--------+------+-------+------+--------+
| NAME | JOB | YEAR | MONTH | DAYS | SALARY |
+------+--------+------+-------+------+--------+
| Bob | Worker | 2013 | 12 | 3 | 17 |
| Mary | Employ | 2013 | 12 | 5 | 23 |
| Bob | Worker | 2014 | 1 | 10 | 100 |
| Bob | Worker | 2014 | 1 | 11 | 110 |
| Mary | Employ | 2014 | 1 | 15 | 200 |
| Bob | Worker | 2014 | 2 | 8 | 80 |
| Mary | Employ | 2014 | 2 | 5 | 190 |
+------+--------+------+-------+------+--------+
是否有一种简单的方法无需手动创建所有枢轴零件即可获得这样的输出?
index=JOB,MAX(YEAR),NAME,SUM(DAYS)
columns=MONTH
values=SUM(SALARY)
+-----------+-------------+-------------+
| MONTH | 1 | 2 |
+--------+-----------+------+-----------+-------------+-------------+
| JOB | MAX(YEAR) | NAME | SUM(DAYS) | SUM(SALARY) | SUM(SALARY) |
+--------+-----------+------+-----------+-------------+-------------+
| Employ | 2014 | Mary | 29 | 210 | 190 |
| Worker | 2014 | Bob | 20 | 200 | 80 |
+--------+-----------+------+-----------+-------------+-------------+
从...开始:
In [179]: df
Out[179]:
NAME JOB YEAR MONTH DAYS SALARY
0 Bob Worker 2013 12 3 17
1 Mary Employ 2013 12 5 23
2 Bob Worker 2014 1 10 100
3 Bob Worker 2014 1 11 110
4 Mary Employ 2014 1 15 200
5 Bob Worker 2014 2 8 80
6 Mary Employ 2014 2 5 190
我们可以获得想要使用的大多数数据
result = df.groupby(['JOB', 'NAME', 'MONTH', 'YEAR']).sum().reset_index(['MONTH'])
# MONTH DAYS SALARY
# JOB NAME YEAR
# Employ Mary 2014 1 15 200
# 2014 2 5 190
# 2013 12 5 23
# Worker Bob 2014 1 21 210
# 2014 2 8 80
# 2013 12 3 17
为此,我们加上天数的总和:
total_days = df.groupby(['JOB', 'NAME', 'YEAR'])[['DAYS']].sum()
total_days.columns = ['SUM(DAYS)']
# SUM(DAYS)
# JOB NAME YEAR
# Employ Mary 2013 5
# 2014 20
# Worker Bob 2013 3
# 2014 29
result = result.join(total_days)
del result['DAYS']
# MONTH SALARY SUM(DAYS)
# JOB NAME YEAR
# Employ Mary 2013 12 23 5
# 2014 1 200 20
# 2014 2 190 20
# Worker Bob 2013 12 17 3
# 2014 1 210 29
# 2014 2 80 29
要选择与max(YEAR)
关联的行,我们计算
max_year = df.groupby(['JOB', 'NAME'])[['YEAR']].max()
max_year = max_year.set_index('YEAR', drop=False, append=True)
# YEAR
# JOB NAME YEAR
# Employ Mary 2014 2014
# Worker Bob 2014 2014
因此选择可以表示为左连接:
result = max_year.join(result)
del result['YEAR']
# MONTH SALARY SUM(DAYS)
# JOB NAME YEAR
# Employ Mary 2014 1 200 20
# 2014 2 190 20
# Worker Bob 2014 1 210 29
# 2014 2 80 29
现在我们可以将MONTH移到这样的层次列级别:
result = result.set_index(['SUM(DAYS)', 'MONTH'], append=True)
result = result.unstack('MONTH')
result = result.reset_index(['SUM(DAYS)'])
产生
SUM(DAYS) SALARY
MONTH 1 2
JOB NAME YEAR
Employ Mary 2014 20 200 190
Worker Bob 2014 29 210 80
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.