简体   繁体   中英

How to groupby time series data

I have a dataframe below,column B's dtype is datetime64.

    A      B
0   a   2016-09-13
1   b   2016-09-14
2   b   2016-09-15
3   a   2016-10-13
4   a   2016-10-14

I would like to groupby according to month(or in general year and day...)

so I would like to get count result below, key = column B.

              a       b
2016-09       1       2
2016-10       2       0

I tried groupby. but I couldn't figure out how to handle dtypes like datetime64... How can I handle and group dtype datetime64?

If you set the index to the datetime you can use pd.TimeGrouper to sort by various time ranges. Example code:

# recreate dataframe
df = pd.DataFrame({'A': ['a', 'b', 'b', 'a', 'a'], 'B': ['2016-09-13', '2016-09-14', '2016-09-15',
                                                        '2016-10-13', '2016-10-14']})
df['B'] = pd.to_datetime(df['B'])

# set column B as index for use of TimeGrouper
df.set_index('B', inplace=True)

# Now do the magic of Ami Tavory's answer combined with timeGrouper:
df = df.groupby([pd.TimeGrouper('M'), 'A']).size().unstack().fillna(0)

This returns:

A             a    b
B                   
2016-09-30  1.0  2.0
2016-10-31  2.0  0.0

or alternatively (credits to ayhan) skip the setting to index step and use the following one-liner straight after creating the dataframe:

# recreate dataframe
df = pd.DataFrame({'A': ['a', 'b', 'b', 'a', 'a'], 'B': ['2016-09-13', '2016-09-14', '2016-09-15',
                                                        '2016-10-13', '2016-10-14']})
df['B'] = pd.to_datetime(df['B'])
df = df.groupby([pd.Grouper(key='B', freq='M'), 'A']).size().unstack().fillna(0)

which returns the same answer

Say you start with

In [247]: df = pd.DataFrame({'A': ['a', 'b', 'b', 'a', 'a'], 'B': ['2016-09-13', '2016-09-14', '2016-09-15', '2016-10-13', '2016-10-14']})

In [248]: df.B = pd.to_datetime(df.B)

Then you can groupby - size , then unstack :

In [249]: df = df.groupby([df.B.dt.year.astype(str) + '-' + df.B.dt.month.astype(str), df.A]).size().unstack().fillna(0).astype(int)

Finally, you just need to make B a date again:

In [250]: df.index = pd.to_datetime(df.index)

In [251]: df
Out[251]: 
A           a  b
B               
2016-10-01  2  0
2016-09-01  1  2

Note that the final conversion to a date-time set a uniform day (you can't have a "dayless" object of this type).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM