简体   繁体   中英

Group by and add new column with min value between dates - pandas

I have this Pandas dataframe:

在此处输入图片说明

I want a new DF to group them by ['ticked_id','time_a'] and add a new column with the min difference in time (hh), SQL code that works:

SELECT ticket_id, DATEDIFF('hh', time_a, MIN(time_b)) each_diff from ...

I've tried to group them but it results on an object that I can't see

For

df = pd.DataFrame({
    'ticket_id': [1, 2, 2],
    'time_a': ['2021-07-21 12:00:01', '2021-07-21 12:00:01', '2021-07-21 12:00:01'],
    'time_b': ['2021-07-21 14:00:02', '2021-07-21 13:00:05', '2021-07-21 17:00:10']
})
df.time_a = pd.to_datetime(df.time_a)
df.time_b = pd.to_datetime(df.time_b)
   ticket_id              time_a              time_b
0          1 2021-07-21 12:00:01 2021-07-21 14:00:02
1          2 2021-07-21 12:00:01 2021-07-21 13:00:05
2          2 2021-07-21 12:00:01 2021-07-21 17:00:10

this

df = df.groupby(['ticket_id', 'time_a'], as_index=False).agg(time_b_min=('time_b', 'min'))
df['diff'] = df.time_b_min - df.time_a

gives you

   ticket_id              time_a          time_b_min            diff
0          1 2021-07-21 12:00:01 2021-07-21 14:00:02 0 days 02:00:01
1          2 2021-07-21 12:00:01 2021-07-21 13:00:05 0 days 01:00:04

To group the data and get a column with the minimum date of the time_b column you can do:

df_grouped = df.groupby(['ticket_id', 'time_a'])['time_b'].min().reset_index()

I don't know the datatypes of your time_a and time_b columns but if they are timestamps you can then do the following to get the difference in hours:

df_grouped['each_diff'] = (df_grouped['time_b'] - df_grouped['time_a').astype('timedelta64[h]')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM