简体   繁体   中英

Time between two rows with conditions in pandas

Let's suppose we have a pandas dataframe with work shifts:

df_aux = pd.DataFrame({'Worker' : ['Alice','Alice','Alice','Alice','Alice', 'Bob','Bob','Bob'],
                          'Shift_start' : ['2022-01-01 10:00:00', '2022-01-01 10:30:00', '2022-01-01 11:45:00', '2022-01-01 12:45:00', '2022-01-01 13:15:00', '2022-01-01 10:30:00', '2022-01-01 12:00:00', '2022-01-01 13:15:00'],
                          'Shift_end' :   ['2022-01-01 10:15:00', '2022-01-01 11:45:00', '2022-01-01 12:30:00', '2022-01-01 13:15:00', '2022-01-01 14:00:00', '2022-01-01 11:30:00', '2022-01-01 13:10:00', '2022-01-01 14:30:00'],
                          'Position' : [1, 1, 2, 2, 2, 1, 2, 3],
                          'Role' : ['A', 'B', 'B', 'A', 'B', 'A', 'B', 'A']})
Worker Shift_start Shift_end Position Role
Alice 2022-01-01 10:00:00 2022-01-01 10:15:00 1 A
Alice 2022-01-01 10:30:00 2022-01-01 11:45:00 1 B
Alice 2022-01-01 11:45:00 2022-01-01 12:30:00 2 B
Alice 2022-01-01 12:45:00 2022-01-01 13:15:00 2 A
Alice 2022-01-01 13:15:00 2022-01-01 14:00:00 2 B
Bob 2022-01-01 10:30:00 2022-01-01 11:30:00 1 A
Bob 2022-01-01 12:00:00 2022-01-01 13:10:00 2 B
Bob 2022-01-01 13:15:00 2022-01-01 14:30:00 3 A

The Position column refers to the place where the workers are, while there are two roles, A and B (let's say there are main and auxiliar, for example). I would need to compute the time each worker is at the current position, regardless of their role, and the time they are in the same position AND role at the time of certain events. These events are given in a df_main, which records the time and position:

df_main = pd.DataFrame({'Event_time' : ['2022-01-01 11:05:00', '2022-01-01 12:35:00', '2022-01-01 13:25:00'] ,
                        'Position' : [1, 2, 2]})
Event_time Position
2022-01-01 11:05:00 1
2022-01-01 12:35:00 2
2022-01-01 13:25:00 2

The idea would be to perform a merge between df_main and df_aux to have the following info:

Event_time Worker Shift_start Shift_end Position Role Time_in_position Time_in_position_role
2022-01-01 11:05:00 Alice 2022-01-01 10:30:00 2022-01-01 11:45:00 1 B 1 hours 05 minutes 0 hours 35 minutes
2022-01-01 11:05:00 Bob 2022-01-01 10:30:00 2022-01-01 13:30:00 1 A 0 hours 35 minutes 0 hours 35 minutes
2022-01-01 12:35:00 Bob 2022-01-01 12:00:00 2022-01-01 15:10:00 2 B 0 hours 35 minutes 0 hours 35 minutes
2022-01-01 13:25:00 Alice 2022-01-01 13:15:00 2022-01-01 14:00:00 2 B 1 hours 40 minutes 0 hours 10 minutes

The first row is duplicated, because both Alice and Bob were in that position at the time of the event, but with different roles. I managed to compute the Time_in_position_role column:

df_full = df_main.merge(df_aux, on='Position')
df_full = df_full[(df_full['Event_time']>df_full['Shift_start']) & (df_full['Event_time']<df_full['Shift_end'])]
df_full['Time_in_position_role'] = df_full['Event_time'] - df_full['Shift_start']

But I am unable to do the same for the Time_in_position one. Any ideas?

The logic is:

  • For each "Worker", find the time period for which he was in particular position. If there are multiple rows, then merge them.
  • Join this with your result df and filter with same logic for "Time_in_position".
# For each "Worker", find the time period for which he was in particular position. If there are multiple rows, then merge them.
def sort_n_rank(g):
    df_g = g.apply(pd.Series)
    df_g = df_g.sort_values(0)
    return (df_g[1] != df_g[1].shift(1)).cumsum()

df_aux["start_position"] = df_aux[["Shift_start", "Position"]].apply(tuple, axis=1)
df_aux["rank"] = df_aux.groupby("Worker")[["start_position"]].transform(sort_n_rank)
df_worker_position = df_aux.groupby(["Worker", "rank"]) \
                           .agg( \
                                Shift_start_min = ("Shift_start", "min"),
                                Shift_end_max = ("Shift_end", "max"),
                                Position = ("Position", "first")
                               ) \
                           .reset_index()

df_full = df_full.merge(df_worker_position, on=["Worker", "Position"])
df_full = df_full[(df_full["Event_time"] > df_full["Shift_start_min"]) & (df_full["Event_time"] < df_full["Shift_end_max"])]
df_full["Time_in_position"] = df_full["Event_time"] - df_full["Shift_start_min"]

Output:

           Event_time Worker         Shift_start           Shift_end  Position Role Time_in_position Time_in_position_role
0 2022-01-01 11:05:00  Alice 2022-01-01 10:30:00 2022-01-01 11:45:00         1    B  0 days 01:05:00       0 days 00:35:00
1 2022-01-01 11:05:00    Bob 2022-01-01 10:30:00 2022-01-01 11:30:00         1    A  0 days 00:35:00       0 days 00:35:00
2 2022-01-01 12:35:00    Bob 2022-01-01 12:00:00 2022-01-01 13:10:00         2    B  0 days 00:35:00       0 days 00:35:00
3 2022-01-01 13:25:00  Alice 2022-01-01 13:15:00 2022-01-01 14:00:00         2    B  0 days 01:40:00       0 days 00:10:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM