python: iterate in groupby pandas, add new columns based on previous value

Question

My dataframe is:

id            beg                       end                 
client1     2021-10-19 16:01:01       2021-10-21 08:19:17                
client1     2021-10-21 10:41:53       2021-10-24 07:53:57  
client3     2021-10-21 09:00:00       2021-10-21 10:00:00       
client3     2021-10-21 10:00:00       2021-10-22 14:00:00             
client2     2021-10-21 10:00:00       2021-10-21 14:00:00

I want to add columns based on previous value of column by id as shown below.

If the client appears more than once then I want to create new columns on the second appearance of this client...

col1 that takes the previous end
col2 that takes the current beg of this row (in this example, for client1 and client3).

...else put nothing in col1 , col2

Expected output:

  id                 beg                   end                col1                col2
client1     2021-10-19 16:01:01    2021-10-21 08:19:17      -                      -          
client1     2021-10-21 10:41:53    2021-10-24 07:53:57       2021-10-21 08:19:17.  2021-10-21 10:41:53
client3     2021-10-21 09:00:00    2021-10-21 10:00:00       --                     --
client3     2021-10-21 10:00:00    2021-10-22 14:00:00      2021-10-21 10:00:00    2021-10-21 10:00:00
client2     2021-10-21 10:00:00    2021-10-21 14:00:00    2021-10-21 10:00:00   2021-10-21 14:00:00

Answer 1

Let us start with the easy way (only get the previous value):

We can use groupby + shift :

df['col1'] = df.groupby('id')['end'].shift()

output:

        id                  beg                  end                 col1
0  client1  2021-10-19 16:01:01  2021-10-21 08:19:17                  NaN
1  client1  2021-10-21 10:41:53  2021-10-24 07:53:57  2021-10-21 08:19:17
2  client3  2021-10-21 09:00:00  2021-10-21 10:00:00                  NaN
3  client3  2021-10-21 10:00:00  2021-10-22 14:00:00  2021-10-21 10:00:00
4  client2  2021-10-21 10:00:00  2021-10-21 14:00:00                  NaN

But we want this condition if the group only has one row… so we can apply a mask using where and a condition on the group size :

g = df.groupby('id')
m = g['beg'].transform('size').gt(1)
df['col1'] = g['end'].shift().where(m, df['beg'])
df['col2'] = df['beg'].where(df['col1'].notnull())
df['col2'] = df['col2'].where(m, df['end'])

output:

        id                  beg                  end                 col1                 col2
0  client1  2021-10-19 16:01:01  2021-10-21 08:19:17                  NaN                  NaN
1  client1  2021-10-21 10:41:53  2021-10-24 07:53:57  2021-10-21 08:19:17  2021-10-21 10:41:53
2  client3  2021-10-21 09:00:00  2021-10-21 10:00:00                  NaN                  NaN
3  client3  2021-10-21 10:00:00  2021-10-22 14:00:00  2021-10-21 10:00:00  2021-10-21 10:00:00
4  client2  2021-10-21 10:00:00  2021-10-21 14:00:00  2021-10-21 10:00:00  2021-10-21 14:00:00

python: iterate in groupby pandas, add new columns based on previous value

Question

1 answers

solution1
0 2021-11-03 22:19:52

python: iterate in groupby pandas, add new columns based on previous value

Question

1 answers

solution1 0 2021-11-03 22:19:52

solution1
0 2021-11-03 22:19:52