简体   繁体   中英

Use one column of a groupby to create X new columns with pandas

I've this dataframe df :

                     date dir
0     2018-01-23 11:39:41  O1
1     2018-01-23 12:47:58  E0
2     2018-01-23 13:01:19  O1
3     2018-01-23 13:01:21  O1
4     2018-01-23 13:06:06  O1
5     2018-01-23 13:32:55  O1
6     2018-01-23 13:33:56  O1
7     2018-01-23 13:33:58  O1
8     2018-01-23 13:46:47  E0
9     2018-01-23 14:04:01  E0
10    2018-01-23 14:04:39  O1
11    2018-01-23 14:09:16  E0
12    2018-01-23 14:17:46  E0
...

I want to count the number of occurence by date (hourly) and by dir (direction). There is two different directions: E0 and O1 .

So I've done that:

df = df.groupby(['dir',pd.Grouper(key='date', freq='H')]).size()

Of course I got something like that:

dir  date               
E0   2018-01-23 12:00:00     1
     2018-01-23 13:00:00     1
     2018-01-23 14:00:00     5
...
O1   2018-05-21 19:00:00     1
     2018-05-21 20:00:00     1
     2018-05-22 06:00:00     2
...

But I would like to create a new column for each distinct direction:

                date    E0 O1
 2018-05-21 19:00:00     1  0
 2018-05-21 20:00:00     1  2
 2018-05-22 06:00:00     2  0
...

How could I do that ?

Use Series.unstack by first level and parameter fill_value for replace new NaN s values for non exist combinations dir and date s:

df = df.groupby(['dir',pd.Grouper(key='date', freq='H')]).size().unstack(0, fill_value=0)
print (df)
dir                  E0  O1
date                       
2018-01-23 11:00:00   0   1
2018-01-23 12:00:00   1   0
2018-01-23 13:00:00   1   6
2018-01-23 14:00:00   3   1

另一个可能的解决方案是使用pd.pivot_table()

df.pivot_table(index= ['date'], columns='dir', aggfunc='size', fill_value=0).resample('1H').sum()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM