This question is intended as a follow-up to the following question: Count maximum consecutive occurences of a string in a dataframe column
Let's say I now have the following dataframe:
col1 col2
0 21.02.2020 string1
1 19.02.2020 string1
2 16.02.2020 string1
3 14.02.2020 string2
4 10.02.2020 string3
5 08.02.2020 string3
6 02.02.2020 string1
How can I now determine the maximum number of occurences of a string in any period of one week starting on a Monday and ending on a Sunday? And how can I do the same for any period of two weeks starting on a Monday and ending on the next Sunday?
I would like to count the occurences so that if the dataframe spans 5 weeks, it would return the highest number of occurences of string1
in a week in that timespan. And if the dataframe were for example only:
col1 col2
0 21.02.2020 string1
it would return 1
for string1
.
I believe you need for test number of rows span in weeks from Monday to Sunday:
df['col1'] = pd.to_datetime(df['col1'], dayfirst=True)
g = df['col2'].ne(df['col2'].shift()).cumsum()
df1 = df.groupby([g, pd.Grouper(freq='W-Mon', key='col1')])['col2'].agg(['first','size'])
print (df1)
first size
col2 col1
1 2020-02-17 string1 1
2020-02-24 string1 2
2 2020-02-17 string2 1
3 2020-02-10 string3 2
4 2020-02-03 string1 1
df2 = (df1.sort_values('size')
.drop_duplicates('first', keep='last')
.reset_index(level=0, drop=True))
print (df2)
first size
col1
2020-02-17 string2 1
2020-02-24 string1 2
2020-02-10 string3 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.