Time between two rows with conditions in pandas

Question

Let's suppose we have a pandas dataframe with work shifts:

df_aux = pd.DataFrame({'Worker' : ['Alice','Alice','Alice','Alice','Alice', 'Bob','Bob','Bob'],
                          'Shift_start' : ['2022-01-01 10:00:00', '2022-01-01 10:30:00', '2022-01-01 11:45:00', '2022-01-01 12:45:00', '2022-01-01 13:15:00', '2022-01-01 10:30:00', '2022-01-01 12:00:00', '2022-01-01 13:15:00'],
                          'Shift_end' :   ['2022-01-01 10:15:00', '2022-01-01 11:45:00', '2022-01-01 12:30:00', '2022-01-01 13:15:00', '2022-01-01 14:00:00', '2022-01-01 11:30:00', '2022-01-01 13:10:00', '2022-01-01 14:30:00'],
                          'Position' : [1, 1, 2, 2, 2, 1, 2, 3],
                          'Role' : ['A', 'B', 'B', 'A', 'B', 'A', 'B', 'A']})

Worker	Shift_start	Shift_end	Position	Role
Alice	2022-01-01 10:00:00	2022-01-01 10:15:00	1	A
Alice	2022-01-01 10:30:00	2022-01-01 11:45:00	1	B
Alice	2022-01-01 11:45:00	2022-01-01 12:30:00	2	B
Alice	2022-01-01 12:45:00	2022-01-01 13:15:00	2	A
Alice	2022-01-01 13:15:00	2022-01-01 14:00:00	2	B
Bob	2022-01-01 10:30:00	2022-01-01 11:30:00	1	A
Bob	2022-01-01 12:00:00	2022-01-01 13:10:00	2	B
Bob	2022-01-01 13:15:00	2022-01-01 14:30:00	3	A

The Position column refers to the place where the workers are, while there are two roles, A and B (let's say there are main and auxiliar, for example). I would need to compute the time each worker is at the current position, regardless of their role, and the time they are in the same position AND role at the time of certain events. These events are given in a df_main, which records the time and position:

df_main = pd.DataFrame({'Event_time' : ['2022-01-01 11:05:00', '2022-01-01 12:35:00', '2022-01-01 13:25:00'] ,
                        'Position' : [1, 2, 2]})

Event_time	Position
2022-01-01 11:05:00	1
2022-01-01 12:35:00	2
2022-01-01 13:25:00	2

The idea would be to perform a merge between df_main and df_aux to have the following info:

Event_time	Worker	Shift_start	Shift_end	Position	Role	Time_in_position	Time_in_position_role
2022-01-01 11:05:00	Alice	2022-01-01 10:30:00	2022-01-01 11:45:00	1	B	1 hours 05 minutes	0 hours 35 minutes
2022-01-01 11:05:00	Bob	2022-01-01 10:30:00	2022-01-01 13:30:00	1	A	0 hours 35 minutes	0 hours 35 minutes
2022-01-01 12:35:00	Bob	2022-01-01 12:00:00	2022-01-01 15:10:00	2	B	0 hours 35 minutes	0 hours 35 minutes
2022-01-01 13:25:00	Alice	2022-01-01 13:15:00	2022-01-01 14:00:00	2	B	1 hours 40 minutes	0 hours 10 minutes

The first row is duplicated, because both Alice and Bob were in that position at the time of the event, but with different roles. I managed to compute the Time_in_position_role column:

df_full = df_main.merge(df_aux, on='Position')
df_full = df_full[(df_full['Event_time']>df_full['Shift_start']) & (df_full['Event_time']<df_full['Shift_end'])]
df_full['Time_in_position_role'] = df_full['Event_time'] - df_full['Shift_start']

But I am unable to do the same for the Time_in_position one. Any ideas?

Answer 1

The logic is:

For each "Worker", find the time period for which he was in particular position. If there are multiple rows, then merge them.
Join this with your result df and filter with same logic for "Time_in_position".

# For each "Worker", find the time period for which he was in particular position. If there are multiple rows, then merge them.
def sort_n_rank(g):
    df_g = g.apply(pd.Series)
    df_g = df_g.sort_values(0)
    return (df_g[1] != df_g[1].shift(1)).cumsum()

df_aux["start_position"] = df_aux[["Shift_start", "Position"]].apply(tuple, axis=1)
df_aux["rank"] = df_aux.groupby("Worker")[["start_position"]].transform(sort_n_rank)
df_worker_position = df_aux.groupby(["Worker", "rank"]) \
                           .agg( \
                                Shift_start_min = ("Shift_start", "min"),
                                Shift_end_max = ("Shift_end", "max"),
                                Position = ("Position", "first")
                               ) \
                           .reset_index()

df_full = df_full.merge(df_worker_position, on=["Worker", "Position"])
df_full = df_full[(df_full["Event_time"] > df_full["Shift_start_min"]) & (df_full["Event_time"] < df_full["Shift_end_max"])]
df_full["Time_in_position"] = df_full["Event_time"] - df_full["Shift_start_min"]

Output:

           Event_time Worker         Shift_start           Shift_end  Position Role Time_in_position Time_in_position_role
0 2022-01-01 11:05:00  Alice 2022-01-01 10:30:00 2022-01-01 11:45:00         1    B  0 days 01:05:00       0 days 00:35:00
1 2022-01-01 11:05:00    Bob 2022-01-01 10:30:00 2022-01-01 11:30:00         1    A  0 days 00:35:00       0 days 00:35:00
2 2022-01-01 12:35:00    Bob 2022-01-01 12:00:00 2022-01-01 13:10:00         2    B  0 days 00:35:00       0 days 00:35:00
3 2022-01-01 13:25:00  Alice 2022-01-01 13:15:00 2022-01-01 14:00:00         2    B  0 days 01:40:00       0 days 00:10:00

Time between two rows with conditions in pandas

Question

1 answers

solution1
0 2023-01-01 14:37:39

Time between two rows with conditions in pandas

Question

1 answers

solution1 0 2023-01-01 14:37:39

solution1
0 2023-01-01 14:37:39