[英]Pandas - dataframe groupby - how to get sum of multiple columns
[英]Pandas - How to get get sum of rows by multiple columns in a DataFrame
我有以下 Pandas DataFrame object df
,表示發生在 2000-07-01 到 2018-03-31 之間的事件。 每行代表在該特定日期發生的事件。 FID_1
是索引列,可用於唯一標識每一行事件。 ICC_NAME
列包含 33 個發生位置的唯一值。
comb_date ICC_NAME
FID_1
267 2000-09-18 09:49:00 Alexandra
462 2000-10-19 01:00:00 Alexandra
696 2000-11-26 15:08:00 Alexandra
734 2000-11-27 19:20:00 Alexandra
760 2000-11-28 20:00:00 Alexandra
761 2000-11-28 20:30:00 Alexandra
945 2000-05-12 12:37:00 Alexandra
1242 2000-12-12 14:35:00 Alexandra
1440 2000-12-16 06:45:00 Alexandra
1523 2000-12-17 12:55:00 Alexandra
1701 2000-12-19 18:40:00 Alexandra
1899 2000-12-26 11:42:00 Alexandra
1963 2000-12-29 09:43:00 Alexandra
1975 2000-12-29 15:54:00 Alexandra
2004 2000-12-30 13:26:00 Alexandra
2044 2000-12-31 13:18:00 Alexandra
2100 2001-01-01 00:06:00 Alexandra
2202 2001-02-01 13:34:00 Alexandra
2826 2001-11-01 13:32:00 Alexandra
2991 2001-01-15 10:55:00 Alexandra
3175 2001-01-20 11:18:00 Alexandra
3176 2001-01-20 11:35:00 Alexandra
3212 2001-01-20 22:55:00 Alexandra
3371 2001-01-26 14:25:00 Alexandra
3386 2001-01-26 19:05:00 Alexandra
3395 2001-01-27 13:20:00 Alexandra
3432 2001-01-28 18:03:00 Alexandra
3701 2001-06-02 18:29:00 Alexandra
3881 2001-02-14 10:00:00 Alexandra
4131 2001-02-21 17:48:00 Alexandra
... ... ...
... ... ...
... ... Boort
... ... Boort
... ... ...
... ... ...
96968 2018-01-25 17:27:00 Woori Yallock
96983 2018-01-25 19:04:00 Woori Yallock
96995 2018-01-26 00:03:00 Woori Yallock
97002 2018-01-26 09:39:00 Woori Yallock
97105 2018-01-28 11:12:00 Woori Yallock
97143 2018-01-29 14:42:00 Woori Yallock
97144 2018-01-29 15:00:00 Woori Yallock
97160 2018-01-30 21:54:00 Woori Yallock
97249 2018-06-02 22:40:00 Woori Yallock
97314 2018-11-02 12:38:00 Woori Yallock
97361 2018-02-13 16:49:00 Woori Yallock
97362 2018-02-13 16:55:00 Woori Yallock
97368 2018-02-14 05:48:00 Woori Yallock
97446 2018-02-18 11:17:00 Woori Yallock
97475 2018-02-19 18:52:00 Woori Yallock
97485 2018-02-20 15:42:00 Woori Yallock
97496 2018-02-20 22:19:00 Woori Yallock
97514 2018-02-22 14:47:00 Woori Yallock
97563 2018-02-25 20:37:00 Woori Yallock
97641 2018-02-28 17:19:00 Woori Yallock
97642 2018-02-28 17:45:00 Woori Yallock
97769 2018-07-03 07:35:00 Woori Yallock
97786 2018-07-03 22:05:00 Woori Yallock
97902 2018-11-03 16:20:00 Woori Yallock
97938 2018-12-03 14:33:00 Woori Yallock
97939 2018-12-03 14:35:00 Woori Yallock
97946 2018-12-03 20:23:00 Woori Yallock
98046 2018-03-17 18:24:00 Woori Yallock
98090 2018-03-18 11:06:00 Woori Yallock
98207 2018-03-22 19:58:00 Woori Yallock
[98372 rows x 2 columns]
我想要實現的是獲得每個 YYYY-MM 和每個 ICC_NAME 的事件總和。
yyyy-mm Alexandra Boort ... Woori Yallock
2000-07 29 12 ... 8
2000-08 20 16 ... 13
... ...
... ...
2018-03 41 8 ... 28
我正在考慮使用 resample 但不確定 sum() 應該應用於哪一列。
使用crosstab
,通過Series.dt.to_period
將日期時間轉換為月份,最后更改索引,通過DataFrame.rename_axis
將列名稱轉換為PeriodIndex
的DataFrame.reset_index
:
df['comb_date'] = pd.to_datetime(df['comb_date'])
df1 = (pd.crosstab(df['comb_date'].dt.to_period('m'), df['ICC_NAME'])
.rename_axis(columns=None, index='yyy-mm')
.reset_index())
print (df1)
yyy-mm Alexandra Woori Yallock
0 2000-05 1 0
1 2000-09 1 0
2 2000-10 1 0
3 2000-11 4 0
4 2000-12 9 0
5 2001-01 9 0
6 2001-02 3 0
7 2001-06 1 0
8 2001-11 1 0
9 2018-01 0 8
10 2018-02 0 11
11 2018-03 0 3
12 2018-06 0 1
13 2018-07 0 2
14 2018-11 0 2
15 2018-12 0 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.