I've this dataframe df
:
date dir
0 2018-01-23 11:39:41 O1
1 2018-01-23 12:47:58 E0
2 2018-01-23 13:01:19 O1
3 2018-01-23 13:01:21 O1
4 2018-01-23 13:06:06 O1
5 2018-01-23 13:32:55 O1
6 2018-01-23 13:33:56 O1
7 2018-01-23 13:33:58 O1
8 2018-01-23 13:46:47 E0
9 2018-01-23 14:04:01 E0
10 2018-01-23 14:04:39 O1
11 2018-01-23 14:09:16 E0
12 2018-01-23 14:17:46 E0
...
I want to count the number of occurence by date
(hourly) and by dir
(direction). There is two different directions: E0
and O1
.
So I've done that:
df = df.groupby(['dir',pd.Grouper(key='date', freq='H')]).size()
Of course I got something like that:
dir date
E0 2018-01-23 12:00:00 1
2018-01-23 13:00:00 1
2018-01-23 14:00:00 5
...
O1 2018-05-21 19:00:00 1
2018-05-21 20:00:00 1
2018-05-22 06:00:00 2
...
But I would like to create a new column for each distinct direction:
date E0 O1
2018-05-21 19:00:00 1 0
2018-05-21 20:00:00 1 2
2018-05-22 06:00:00 2 0
...
How could I do that ?
Use Series.unstack
by first level and parameter fill_value
for replace new NaN
s values for non exist combinations dir
and date
s:
df = df.groupby(['dir',pd.Grouper(key='date', freq='H')]).size().unstack(0, fill_value=0)
print (df)
dir E0 O1
date
2018-01-23 11:00:00 0 1
2018-01-23 12:00:00 1 0
2018-01-23 13:00:00 1 6
2018-01-23 14:00:00 3 1
另一个可能的解决方案是使用pd.pivot_table() :
df.pivot_table(index= ['date'], columns='dir', aggfunc='size', fill_value=0).resample('1H').sum()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.