简体   繁体   中英

Count maximum occurences in a dataframe column for a specific timespan

This question is intended as a follow-up to the following question: Count maximum consecutive occurences of a string in a dataframe column

Let's say I now have the following dataframe:

      col1      col2
0  21.02.2020  string1
1  19.02.2020  string1
2  16.02.2020  string1
3  14.02.2020  string2
4  10.02.2020  string3
5  08.02.2020  string3
6  02.02.2020  string1

How can I now determine the maximum number of occurences of a string in any period of one week starting on a Monday and ending on a Sunday? And how can I do the same for any period of two weeks starting on a Monday and ending on the next Sunday?

I would like to count the occurences so that if the dataframe spans 5 weeks, it would return the highest number of occurences of string1 in a week in that timespan. And if the dataframe were for example only:

      col1      col2
0  21.02.2020  string1

it would return 1 for string1 .

I believe you need for test number of rows span in weeks from Monday to Sunday:

df['col1'] = pd.to_datetime(df['col1'], dayfirst=True)
g = df['col2'].ne(df['col2'].shift()).cumsum()

df1 = df.groupby([g, pd.Grouper(freq='W-Mon', key='col1')])['col2'].agg(['first','size'])
print (df1)
                   first  size
col2 col1                     
1    2020-02-17  string1     1
     2020-02-24  string1     2
2    2020-02-17  string2     1
3    2020-02-10  string3     2
4    2020-02-03  string1     1

df2 = (df1.sort_values('size')
          .drop_duplicates('first', keep='last')
          .reset_index(level=0, drop=True))
print (df2)
              first  size
col1                     
2020-02-17  string2     1
2020-02-24  string1     2
2020-02-10  string3     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM