简体   繁体   English

python:在groupby pandas中迭代,根据以前的值添加新列

[英]python: iterate in groupby pandas, add new columns based on previous value

My dataframe is:我的数据框是:

id            beg                       end                 
client1     2021-10-19 16:01:01       2021-10-21 08:19:17                
client1     2021-10-21 10:41:53       2021-10-24 07:53:57  
client3     2021-10-21 09:00:00       2021-10-21 10:00:00       
client3     2021-10-21 10:00:00       2021-10-22 14:00:00             
client2     2021-10-21 10:00:00       2021-10-21 14:00:00

I want to add columns based on previous value of column by id as shown below.我想根据id列的先前值添加列,如下所示。

If the client appears more than once then I want to create new columns on the second appearance of this client...如果客户端出现不止一次,那么我想在该客户端第二次出现时创建新列...

  • col1 that takes the previous end col1取前一个end
  • col2 that takes the current beg of this row (in this example, for client1 and client3). col2接受该行的当前beg (在本例中,对于 client1 和 client3)。

...else put nothing in col1 , col2 ...否则在col1 , col2什么都不放

Expected output:预期输出:

  id                 beg                   end                col1                col2
client1     2021-10-19 16:01:01    2021-10-21 08:19:17      -                      -          
client1     2021-10-21 10:41:53    2021-10-24 07:53:57       2021-10-21 08:19:17.  2021-10-21 10:41:53
client3     2021-10-21 09:00:00    2021-10-21 10:00:00       --                     --
client3     2021-10-21 10:00:00    2021-10-22 14:00:00      2021-10-21 10:00:00    2021-10-21 10:00:00
client2     2021-10-21 10:00:00    2021-10-21 14:00:00    2021-10-21 10:00:00   2021-10-21 14:00:00

Let us start with the easy way (only get the previous value):让我们从简单的方法开始(只获取之前的值):

We can use groupby + shift :我们可以使用groupby + shift

df['col1'] = df.groupby('id')['end'].shift()

output:输出:

        id                  beg                  end                 col1
0  client1  2021-10-19 16:01:01  2021-10-21 08:19:17                  NaN
1  client1  2021-10-21 10:41:53  2021-10-24 07:53:57  2021-10-21 08:19:17
2  client3  2021-10-21 09:00:00  2021-10-21 10:00:00                  NaN
3  client3  2021-10-21 10:00:00  2021-10-22 14:00:00  2021-10-21 10:00:00
4  client2  2021-10-21 10:00:00  2021-10-21 14:00:00                  NaN

But we want this condition if the group only has one row… so we can apply a mask using where and a condition on the group size :但是如果组只有一行,我们想要这个条件……所以我们可以使用where和组size的条件来应用掩码:

g = df.groupby('id')
m = g['beg'].transform('size').gt(1)
df['col1'] = g['end'].shift().where(m, df['beg'])
df['col2'] = df['beg'].where(df['col1'].notnull())
df['col2'] = df['col2'].where(m, df['end'])

output:输出:

        id                  beg                  end                 col1                 col2
0  client1  2021-10-19 16:01:01  2021-10-21 08:19:17                  NaN                  NaN
1  client1  2021-10-21 10:41:53  2021-10-24 07:53:57  2021-10-21 08:19:17  2021-10-21 10:41:53
2  client3  2021-10-21 09:00:00  2021-10-21 10:00:00                  NaN                  NaN
3  client3  2021-10-21 10:00:00  2021-10-22 14:00:00  2021-10-21 10:00:00  2021-10-21 10:00:00
4  client2  2021-10-21 10:00:00  2021-10-21 14:00:00  2021-10-21 10:00:00  2021-10-21 14:00:00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM