I have this dataframe df
id date time record
0 1 2021-07-08 3:00:00 8
1 1 2021-07-08 5:30:00 7
2 1 2021-07-08 9:00:00 10
3 1 2021-01-08 6:30:00 5
4 1 2021-01-08 9:30:00 7
5 2 2021-07-08 3:00:00 7
6 2 2021-07-08 9:00:00 14
7 2 2021-07-08 5:30:00 10
8 2 2021-01-08 3:00:00 11
9 2 2021-01-08 3:00:00 13
I need to create a new column max
equal to the maximum grouped by id
and date
. But also I need to get maximum value only if time
is earlier than 7:00:00. Ie for id=1
and date=2021-07-08
max
columns should be equal to 8, not 10, because we got 10 when time
was 9:00:00.
Here's the dataframe df
in a more accessible way
import io
import pandas as pd
data1_txt = """
id,date,time,record
1,2021-07-08,3:00:00,8
1,2021-07-08,5:30:00,7
1,2021-07-08,9:00:00,10
1,2021-01-08,6:30:00,5
1,2021-01-08,9:30:00,7
2,2021-07-08,3:00:00,7
2,2021-07-08,9:00:00,14
2,2021-07-08,5:30:00,10
2,2021-01-08,3:00:00,11
2,2021-01-08,3:00:00,13
"""
df = pd.read_csv(io.StringIO(data1_txt))
The desirable result is
id date time record max
0 1 2021-07-08 3:00:00 8 8
1 1 2021-07-08 5:30:00 7 8
2 1 2021-07-08 9:00:00 10 8
3 1 2021-01-08 6:30:00 5 5
4 1 2021-01-08 9:30:00 7 5
5 2 2021-07-08 3:00:00 7 10
6 2 2021-07-08 9:00:00 14 10
7 2 2021-07-08 5:30:00 10 10
8 2 2021-01-08 3:00:00 11 13
9 2 2021-01-08 3:00:00 13 13
Let us do in steps
7
to create a boolean maskrecord
column where hour is greater than 7
Group
the masked column by id
and date
and transform
using max
to calculate the maximum value per groupm = pd.to_timedelta(df['time']).dt.components['hours'].ge(7)
df['max'] = df['record'].mask(m).groupby([df['id'], df['date']]).transform('max')
id date time record max
0 1 2021-07-08 3:00:00 8 8.0
1 1 2021-07-08 5:30:00 7 8.0
2 1 2021-07-08 9:00:00 10 8.0
3 1 2021-01-08 6:30:00 5 5.0
4 1 2021-01-08 9:30:00 7 5.0
5 2 2021-07-08 3:00:00 7 10.0
6 2 2021-07-08 9:00:00 14 10.0
7 2 2021-07-08 5:30:00 10 10.0
8 2 2021-01-08 3:00:00 11 13.0
9 2 2021-01-08 3:00:00 13 13.0
Let's make sure that time is datatime type:
df['time'] = pd.to_datetime(df['time'])
Filter times before 7:00, group and apply transform:
s = df.loc[(df['time'] < '7:00')].groupby(['id', 'date'])['record'].transform('max').rename('max')
Merge result and fill forward:
df2 = pd.concat([df, s], axis=1)
df2['max'] = df2['max'].ffill().astype(int)
output:
id date time record max
0 1 2021-07-08 2021-07-08 03:00:00 8 8
1 1 2021-07-08 2021-07-08 05:30:00 7 8
2 1 2021-07-08 2021-07-08 09:00:00 10 8
3 1 2021-01-08 2021-07-08 06:30:00 5 5
4 1 2021-01-08 2021-07-08 09:30:00 7 5
5 2 2021-07-08 2021-07-08 03:00:00 7 10
6 2 2021-07-08 2021-07-08 09:00:00 14 10
7 2 2021-07-08 2021-07-08 05:30:00 10 10
8 2 2021-01-08 2021-07-08 03:00:00 11 13
9 2 2021-01-08 2021-07-08 03:00:00 13 13
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.