[英]python: iterate in groupby pandas, add new columns based on previous value
My dataframe is:我的数据框是:
id beg end
client1 2021-10-19 16:01:01 2021-10-21 08:19:17
client1 2021-10-21 10:41:53 2021-10-24 07:53:57
client3 2021-10-21 09:00:00 2021-10-21 10:00:00
client3 2021-10-21 10:00:00 2021-10-22 14:00:00
client2 2021-10-21 10:00:00 2021-10-21 14:00:00
I want to add columns based on previous value of column by id
as shown below.我想根据id
列的先前值添加列,如下所示。
If the client appears more than once then I want to create new columns on the second appearance of this client...如果客户端出现不止一次,那么我想在该客户端第二次出现时创建新列...
col1
that takes the previous end
col1
取前一个end
col2
that takes the current beg
of this row (in this example, for client1 and client3). col2
接受该行的当前beg
(在本例中,对于 client1 和 client3)。 ...else put nothing in col1
, col2
...否则在col1
, col2
什么都不放
Expected output:预期输出:
id beg end col1 col2
client1 2021-10-19 16:01:01 2021-10-21 08:19:17 - -
client1 2021-10-21 10:41:53 2021-10-24 07:53:57 2021-10-21 08:19:17. 2021-10-21 10:41:53
client3 2021-10-21 09:00:00 2021-10-21 10:00:00 -- --
client3 2021-10-21 10:00:00 2021-10-22 14:00:00 2021-10-21 10:00:00 2021-10-21 10:00:00
client2 2021-10-21 10:00:00 2021-10-21 14:00:00 2021-10-21 10:00:00 2021-10-21 14:00:00
Let us start with the easy way (only get the previous value):让我们从简单的方法开始(只获取之前的值):
We can use groupby
+ shift
:我们可以使用groupby
+ shift
:
df['col1'] = df.groupby('id')['end'].shift()
output:输出:
id beg end col1
0 client1 2021-10-19 16:01:01 2021-10-21 08:19:17 NaN
1 client1 2021-10-21 10:41:53 2021-10-24 07:53:57 2021-10-21 08:19:17
2 client3 2021-10-21 09:00:00 2021-10-21 10:00:00 NaN
3 client3 2021-10-21 10:00:00 2021-10-22 14:00:00 2021-10-21 10:00:00
4 client2 2021-10-21 10:00:00 2021-10-21 14:00:00 NaN
But we want this condition if the group only has one row… so we can apply a mask using where
and a condition on the group size
:但是如果组只有一行,我们想要这个条件……所以我们可以使用where
和组size
的条件来应用掩码:
g = df.groupby('id')
m = g['beg'].transform('size').gt(1)
df['col1'] = g['end'].shift().where(m, df['beg'])
df['col2'] = df['beg'].where(df['col1'].notnull())
df['col2'] = df['col2'].where(m, df['end'])
output:输出:
id beg end col1 col2
0 client1 2021-10-19 16:01:01 2021-10-21 08:19:17 NaN NaN
1 client1 2021-10-21 10:41:53 2021-10-24 07:53:57 2021-10-21 08:19:17 2021-10-21 10:41:53
2 client3 2021-10-21 09:00:00 2021-10-21 10:00:00 NaN NaN
3 client3 2021-10-21 10:00:00 2021-10-22 14:00:00 2021-10-21 10:00:00 2021-10-21 10:00:00
4 client2 2021-10-21 10:00:00 2021-10-21 14:00:00 2021-10-21 10:00:00 2021-10-21 14:00:00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.