简体   繁体   中英

Pandas group_by date and resample

I have some data frame that looks like this:

    A   B   C   date
0   J   Y   2   2013-02-01 14:21:02.070030
1   X   X   0   2013-02-01 15:49:33.110849
2   Y   D   9   2013-02-01 06:47:19.369514
3   Y   C   17  2013-02-01 08:56:11.751781
4   3   J   21  2013-02-01 14:19:12.017232

I'd like to group by date and then count, but omit the information about the hours, minutes, seconds, etc.

It seems like something like this works:

df.set_index('date').resample('D').count()

Two questions:

  1. Why does that work? Is that the right way?
  2. Why doesn't something like df.group_by('date').resample('D').count() work?

resample is in some sense just a special case of groupby - rather than grouping on distinct values, which is what grouppy('date') would do, it groups a time-based transformation of the index, which is why you need to set the index. Alternatively, you could do:

df.groupby(pd.Grouper(key='date', freq='D')).count()

In the upcoming version 0.19.0 you'll be able to write the above like this.

df.resample('D', on='date').count()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM