简体   繁体   中英

python: iterate in groupby pandas, add new columns based on previous value

My dataframe is:

id            beg                       end                 
client1     2021-10-19 16:01:01       2021-10-21 08:19:17                
client1     2021-10-21 10:41:53       2021-10-24 07:53:57  
client3     2021-10-21 09:00:00       2021-10-21 10:00:00       
client3     2021-10-21 10:00:00       2021-10-22 14:00:00             
client2     2021-10-21 10:00:00       2021-10-21 14:00:00

I want to add columns based on previous value of column by id as shown below.

If the client appears more than once then I want to create new columns on the second appearance of this client...

  • col1 that takes the previous end
  • col2 that takes the current beg of this row (in this example, for client1 and client3).

...else put nothing in col1 , col2

Expected output:

  id                 beg                   end                col1                col2
client1     2021-10-19 16:01:01    2021-10-21 08:19:17      -                      -          
client1     2021-10-21 10:41:53    2021-10-24 07:53:57       2021-10-21 08:19:17.  2021-10-21 10:41:53
client3     2021-10-21 09:00:00    2021-10-21 10:00:00       --                     --
client3     2021-10-21 10:00:00    2021-10-22 14:00:00      2021-10-21 10:00:00    2021-10-21 10:00:00
client2     2021-10-21 10:00:00    2021-10-21 14:00:00    2021-10-21 10:00:00   2021-10-21 14:00:00

Let us start with the easy way (only get the previous value):

We can use groupby + shift :

df['col1'] = df.groupby('id')['end'].shift()

output:

        id                  beg                  end                 col1
0  client1  2021-10-19 16:01:01  2021-10-21 08:19:17                  NaN
1  client1  2021-10-21 10:41:53  2021-10-24 07:53:57  2021-10-21 08:19:17
2  client3  2021-10-21 09:00:00  2021-10-21 10:00:00                  NaN
3  client3  2021-10-21 10:00:00  2021-10-22 14:00:00  2021-10-21 10:00:00
4  client2  2021-10-21 10:00:00  2021-10-21 14:00:00                  NaN

But we want this condition if the group only has one row… so we can apply a mask using where and a condition on the group size :

g = df.groupby('id')
m = g['beg'].transform('size').gt(1)
df['col1'] = g['end'].shift().where(m, df['beg'])
df['col2'] = df['beg'].where(df['col1'].notnull())
df['col2'] = df['col2'].where(m, df['end'])

output:

        id                  beg                  end                 col1                 col2
0  client1  2021-10-19 16:01:01  2021-10-21 08:19:17                  NaN                  NaN
1  client1  2021-10-21 10:41:53  2021-10-24 07:53:57  2021-10-21 08:19:17  2021-10-21 10:41:53
2  client3  2021-10-21 09:00:00  2021-10-21 10:00:00                  NaN                  NaN
3  client3  2021-10-21 10:00:00  2021-10-22 14:00:00  2021-10-21 10:00:00  2021-10-21 10:00:00
4  client2  2021-10-21 10:00:00  2021-10-21 14:00:00  2021-10-21 10:00:00  2021-10-21 14:00:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM