I have recently began working with the python.pivot_table and have encountered a challenge using timestamps properly with the pivot tables.
I have a large dataframe with data like the below
Date ID Days Quantity Concern
0 2012-06-29 NaN 621 NaN A
1 2012-06-29 1208985 874 1 A
2 2012-06-29 NaN 621 2 B
3 2012-06-29 NaN 874 1 C
4 2012-06-29 NaN 566 NaN A
5 2012-06-29 251254 780 NaN A
6 2012-06-29 NaN 566 NaN C
7 2012-06-29 385379 566 1 B
8 2012-06-29 967911 780 1 B
9 2012-06-29 NaN 521 NaN A
10 2012-06-29 1208985 834 1 C
11 2012-06-29 385379 374 NaN A
12 2012-06-29 967909 780 1 B
13 2012-07-18 NaN 821 NaN A
14 2012-07-18 251254 821 NaN A
15 2012-08-04 756444 676 1 C
16 2012-08-04 756444 676 2 C
17 2012-08-04 NaN 676 NaN A
18 2012-08-24 NaN 571 NaN B
19 2012-08-24 251254 446 1 B
A line like the below works great:
pd.pivot_table(data,index=['Concern'],columns=['ID'],values=['Quantity'],aggfunc='sum')
Currently when I use the Date column for index=['Date']
it groups by the day. I would like to option of being able to group by month or year. Is there a way to implement this with pivot tables when the date column are TimeStamp objects?
You can access information like year and month through the .dt
attribute that datetime series have, so you can easily make new columns like:
df['Month'] = df['Date'].dt.month
Then use those columns to create the pivot table:
pd.pivot_table(df, index=['Month'], columns=['ID'],
values=['Quantity'],aggfunc='sum')
Output:
Out[16]:
Quantity
ID 251254 385379 756444 967909 967911 1208985
Month
6 NaN 1 NaN 1 1 2
7 NaN NaN NaN NaN NaN NaN
8 1 NaN 3 NaN NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.