简体   繁体   中英

Get maximum for each group based on time condition

I have this dataframe df

   id        date     time  record
0   1  2021-07-08  3:00:00       8
1   1  2021-07-08  5:30:00       7
2   1  2021-07-08  9:00:00      10
3   1  2021-01-08  6:30:00       5
4   1  2021-01-08  9:30:00       7
5   2  2021-07-08  3:00:00       7
6   2  2021-07-08  9:00:00      14
7   2  2021-07-08  5:30:00      10
8   2  2021-01-08  3:00:00      11
9   2  2021-01-08  3:00:00      13

I need to create a new column max equal to the maximum grouped by id and date . But also I need to get maximum value only if time is earlier than 7:00:00. Ie for id=1 and date=2021-07-08 max columns should be equal to 8, not 10, because we got 10 when time was 9:00:00.

Here's the dataframe df in a more accessible way

import io
import pandas as pd

data1_txt = """
id,date,time,record
1,2021-07-08,3:00:00,8
1,2021-07-08,5:30:00,7
1,2021-07-08,9:00:00,10
1,2021-01-08,6:30:00,5
1,2021-01-08,9:30:00,7
2,2021-07-08,3:00:00,7
2,2021-07-08,9:00:00,14
2,2021-07-08,5:30:00,10
2,2021-01-08,3:00:00,11
2,2021-01-08,3:00:00,13
"""

df = pd.read_csv(io.StringIO(data1_txt))

The desirable result is

   id        date     time  record  max
0   1  2021-07-08  3:00:00       8    8
1   1  2021-07-08  5:30:00       7    8
2   1  2021-07-08  9:00:00      10    8
3   1  2021-01-08  6:30:00       5    5
4   1  2021-01-08  9:30:00       7    5
5   2  2021-07-08  3:00:00       7   10
6   2  2021-07-08  9:00:00      14   10
7   2  2021-07-08  5:30:00      10   10
8   2  2021-01-08  3:00:00      11   13
9   2  2021-01-08  3:00:00      13   13

Let us do in steps

  • Extract the hours component from the time column
  • Compare the hours component with 7 to create a boolean mask
  • Mask the values in record column where hour is greater than 7
  • Group the masked column by id and date and transform using max to calculate the maximum value per group
m = pd.to_timedelta(df['time']).dt.components['hours'].ge(7)
df['max'] = df['record'].mask(m).groupby([df['id'], df['date']]).transform('max')

   id        date     time  record   max
0   1  2021-07-08  3:00:00       8   8.0
1   1  2021-07-08  5:30:00       7   8.0
2   1  2021-07-08  9:00:00      10   8.0
3   1  2021-01-08  6:30:00       5   5.0
4   1  2021-01-08  9:30:00       7   5.0
5   2  2021-07-08  3:00:00       7  10.0
6   2  2021-07-08  9:00:00      14  10.0
7   2  2021-07-08  5:30:00      10  10.0
8   2  2021-01-08  3:00:00      11  13.0
9   2  2021-01-08  3:00:00      13  13.0

Let's make sure that time is datatime type:

df['time'] = pd.to_datetime(df['time'])

Filter times before 7:00, group and apply transform:

s = df.loc[(df['time'] < '7:00')].groupby(['id', 'date'])['record'].transform('max').rename('max')

Merge result and fill forward:

df2 = pd.concat([df, s], axis=1)
df2['max'] = df2['max'].ffill().astype(int)

output:

   id       date                time  record  max
0   1 2021-07-08 2021-07-08 03:00:00       8    8
1   1 2021-07-08 2021-07-08 05:30:00       7    8
2   1 2021-07-08 2021-07-08 09:00:00      10    8
3   1 2021-01-08 2021-07-08 06:30:00       5    5
4   1 2021-01-08 2021-07-08 09:30:00       7    5
5   2 2021-07-08 2021-07-08 03:00:00       7   10
6   2 2021-07-08 2021-07-08 09:00:00      14   10
7   2 2021-07-08 2021-07-08 05:30:00      10   10
8   2 2021-01-08 2021-07-08 03:00:00      11   13
9   2 2021-01-08 2021-07-08 03:00:00      13   13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM