[英]Pandas multiple index with multiple aggregate functions
使用此樣本數據幀數據:
+------+--------+------+-------+------+--------+
| NAME | JOB | YEAR | MONTH | DAYS | SALARY |
+------+--------+------+-------+------+--------+
| Bob | Worker | 2013 | 12 | 3 | 17 |
| Mary | Employ | 2013 | 12 | 5 | 23 |
| Bob | Worker | 2014 | 1 | 10 | 100 |
| Bob | Worker | 2014 | 1 | 11 | 110 |
| Mary | Employ | 2014 | 1 | 15 | 200 |
| Bob | Worker | 2014 | 2 | 8 | 80 |
| Mary | Employ | 2014 | 2 | 5 | 190 |
+------+--------+------+-------+------+--------+
是否有一種簡單的方法無需手動創建所有樞軸零件即可獲得這樣的輸出?
index=JOB,MAX(YEAR),NAME,SUM(DAYS)
columns=MONTH
values=SUM(SALARY)
+-----------+-------------+-------------+
| MONTH | 1 | 2 |
+--------+-----------+------+-----------+-------------+-------------+
| JOB | MAX(YEAR) | NAME | SUM(DAYS) | SUM(SALARY) | SUM(SALARY) |
+--------+-----------+------+-----------+-------------+-------------+
| Employ | 2014 | Mary | 29 | 210 | 190 |
| Worker | 2014 | Bob | 20 | 200 | 80 |
+--------+-----------+------+-----------+-------------+-------------+
從...開始:
In [179]: df
Out[179]:
NAME JOB YEAR MONTH DAYS SALARY
0 Bob Worker 2013 12 3 17
1 Mary Employ 2013 12 5 23
2 Bob Worker 2014 1 10 100
3 Bob Worker 2014 1 11 110
4 Mary Employ 2014 1 15 200
5 Bob Worker 2014 2 8 80
6 Mary Employ 2014 2 5 190
我們可以獲得想要使用的大多數數據
result = df.groupby(['JOB', 'NAME', 'MONTH', 'YEAR']).sum().reset_index(['MONTH'])
# MONTH DAYS SALARY
# JOB NAME YEAR
# Employ Mary 2014 1 15 200
# 2014 2 5 190
# 2013 12 5 23
# Worker Bob 2014 1 21 210
# 2014 2 8 80
# 2013 12 3 17
為此,我們加上天數的總和:
total_days = df.groupby(['JOB', 'NAME', 'YEAR'])[['DAYS']].sum()
total_days.columns = ['SUM(DAYS)']
# SUM(DAYS)
# JOB NAME YEAR
# Employ Mary 2013 5
# 2014 20
# Worker Bob 2013 3
# 2014 29
result = result.join(total_days)
del result['DAYS']
# MONTH SALARY SUM(DAYS)
# JOB NAME YEAR
# Employ Mary 2013 12 23 5
# 2014 1 200 20
# 2014 2 190 20
# Worker Bob 2013 12 17 3
# 2014 1 210 29
# 2014 2 80 29
要選擇與max(YEAR)
關聯的行,我們計算
max_year = df.groupby(['JOB', 'NAME'])[['YEAR']].max()
max_year = max_year.set_index('YEAR', drop=False, append=True)
# YEAR
# JOB NAME YEAR
# Employ Mary 2014 2014
# Worker Bob 2014 2014
因此選擇可以表示為左連接:
result = max_year.join(result)
del result['YEAR']
# MONTH SALARY SUM(DAYS)
# JOB NAME YEAR
# Employ Mary 2014 1 200 20
# 2014 2 190 20
# Worker Bob 2014 1 210 29
# 2014 2 80 29
現在我們可以將MONTH移到這樣的層次列級別:
result = result.set_index(['SUM(DAYS)', 'MONTH'], append=True)
result = result.unstack('MONTH')
result = result.reset_index(['SUM(DAYS)'])
產生
SUM(DAYS) SALARY
MONTH 1 2
JOB NAME YEAR
Employ Mary 2014 20 200 190
Worker Bob 2014 29 210 80
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.