简体   繁体   中英

Find max number of consecutive days

The code below groups the dataframe by a key.

 df = pd.DataFrame(data, columns=['id', 'date', 'cnt'])
 df['date']= pd.to_datetime(df['date']) 
 for c_id, group in df.groupby('id'):
        print(c_id)
        print(group)

This produces a result like this:

    id       date  cnt
    1 2019-01-02    1
   1 2019-01-03    2
   1 2019-01-04    3
   1 2019-01-05    1
   1 2019-01-06    2
   1 2019-01-07    1

    id       date      cnt
    2 2019-01-01   478964
    2 2019-01-02   749249
   2 2019-01-03  1144842
   2 2019-01-04  1540846
   2 2019-01-05  1444918
   2 2019-01-06  1624770
   2 2019-01-07  2227589

    id       date     cnt
    3 2019-01-01   41776
   3 2019-01-02   82322
   3 2019-01-03   93467
   3 2019-01-04   56674
   3 2019-01-05   47606
   3 2019-01-06   41448
   3 2019-01-07  145827


    id       date     cnt
    4 2019-01-01   41776
   4 2019-01-02   82322
   4 2019-01-06   93467
   4 2019-01-07   56674

From this result, I want to find the maximum consecutive number of days for each id. So id 1 would be 6, id 2 would be 7, id 3 would be 7, and id 4 would be 2.

Use:

m = (df.assign(date=pd.to_datetime(df['date'])) #if necessary convert else drop
       .groupby('id')['date']
       .diff()
       .gt(pd.Timedelta('1D'))
       .cumsum())
df.groupby(['id', m]).size().max(level='id')

Output

id
1    6
2    7
3    7
4    2
dtype: int64

To get your result, run:

result = df.groupby('id').apply(lambda grp: grp.groupby(
    (grp.date.shift() + pd.Timedelta(1, 'd') != grp.date).cumsum())
    .id.count().max()) 

Details:

  • df.groupby('id') - First level grouping (by id ).
  • grp.groupby(...) - Second level grouping (by sequences of consecutive dates.
  • grp.date.shift() - Date from the previous row.
  • + pd.Timedelta(1, 'd') - Shifted by 1 day.
  • != grp.date - Not equal to the current date. The result is a Series with True on the start of each sequence of consecutive dates.
  • cumsum() - Convert the above ( bool ) Series to a Series of int - consecutive numbers of above sequences, starting from 1.
  • id - Take id column from each (second level) group.
  • count() - Compute the size of the current group.
  • .max() - Take max from sizes of second level groups (within the current level 1 group).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM