python：在groupby pandas中迭代，根据以前的值添加新列

Question

My dataframe is:我的数据框是：

id            beg                       end                 
client1     2021-10-19 16:01:01       2021-10-21 08:19:17                
client1     2021-10-21 10:41:53       2021-10-24 07:53:57  
client3     2021-10-21 09:00:00       2021-10-21 10:00:00       
client3     2021-10-21 10:00:00       2021-10-22 14:00:00             
client2     2021-10-21 10:00:00       2021-10-21 14:00:00

I want to add columns based on previous value of column by id as shown below.我想根据id列的先前值添加列，如下所示。

If the client appears more than once then I want to create new columns on the second appearance of this client...如果客户端出现不止一次，那么我想在该客户端第二次出现时创建新列...

col1 that takes the previous end col1取前一个end
col2 that takes the current beg of this row (in this example, for client1 and client3). col2接受该行的当前beg （在本例中，对于 client1 和 client3）。

...else put nothing in col1 , col2 ...否则在col1 , col2什么都不放

Expected output:预期输出：

  id                 beg                   end                col1                col2
client1     2021-10-19 16:01:01    2021-10-21 08:19:17      -                      -          
client1     2021-10-21 10:41:53    2021-10-24 07:53:57       2021-10-21 08:19:17.  2021-10-21 10:41:53
client3     2021-10-21 09:00:00    2021-10-21 10:00:00       --                     --
client3     2021-10-21 10:00:00    2021-10-22 14:00:00      2021-10-21 10:00:00    2021-10-21 10:00:00
client2     2021-10-21 10:00:00    2021-10-21 14:00:00    2021-10-21 10:00:00   2021-10-21 14:00:00

Answer 1

Let us start with the easy way (only get the previous value):让我们从简单的方法开始（只获取之前的值）：

We can use groupby + shift :我们可以使用groupby + shift ：

df['col1'] = df.groupby('id')['end'].shift()

output:输出：

        id                  beg                  end                 col1
0  client1  2021-10-19 16:01:01  2021-10-21 08:19:17                  NaN
1  client1  2021-10-21 10:41:53  2021-10-24 07:53:57  2021-10-21 08:19:17
2  client3  2021-10-21 09:00:00  2021-10-21 10:00:00                  NaN
3  client3  2021-10-21 10:00:00  2021-10-22 14:00:00  2021-10-21 10:00:00
4  client2  2021-10-21 10:00:00  2021-10-21 14:00:00                  NaN

But we want this condition if the group only has one row… so we can apply a mask using where and a condition on the group size :但是如果组只有一行，我们想要这个条件……所以我们可以使用where和组size的条件来应用掩码：

g = df.groupby('id')
m = g['beg'].transform('size').gt(1)
df['col1'] = g['end'].shift().where(m, df['beg'])
df['col2'] = df['beg'].where(df['col1'].notnull())
df['col2'] = df['col2'].where(m, df['end'])

output:输出：

        id                  beg                  end                 col1                 col2
0  client1  2021-10-19 16:01:01  2021-10-21 08:19:17                  NaN                  NaN
1  client1  2021-10-21 10:41:53  2021-10-24 07:53:57  2021-10-21 08:19:17  2021-10-21 10:41:53
2  client3  2021-10-21 09:00:00  2021-10-21 10:00:00                  NaN                  NaN
3  client3  2021-10-21 10:00:00  2021-10-22 14:00:00  2021-10-21 10:00:00  2021-10-21 10:00:00
4  client2  2021-10-21 10:00:00  2021-10-21 14:00:00  2021-10-21 10:00:00  2021-10-21 14:00:00

python：在groupby pandas中迭代，根据以前的值添加新列

问题描述

1 个解决方案

解决方案1
0 2021-11-03 22:19:52

python：在groupby pandas中迭代，根据以前的值添加新列

问题描述

1 个解决方案

解决方案1 0 2021-11-03 22:19:52

解决方案1
0 2021-11-03 22:19:52