[英]Pandas create counter column for group but reset count based on multiple conditions
I have the following Dataframe:我有以下 Dataframe:
Worker dt_diff same_employer same_role
1754 0 days 00:00:00 False False
2951 0 days 00:00:00 False False
2951 1 days 00:00:00 True True
2951 1 days 01:00:00 True True
3368 0 days 00:00:00 False False
3368 7 days 00:00:00 True True
3368 7 days 00:00:00 True True
3368 7 days 00:00:00 True True
3368 7 days 00:00:00 True True
3368 7 days 00:00:00 True True
3539 0 days 00:00:00 False False
3539 1 days 00:00:00 True True
3539 1 days 00:00:00 True True
3539 3 days 00:30:00 False False
3539 1 days 00:00:00 True True
3539 2 days 06:00:00 False True
I would like to create a new column containing continuity counter grouped by worker.我想创建一个包含按工作人员分组的连续性计数器的新列。 However the counter will be based on the following conditions:
但是,计数器将基于以下条件:
if (dt_diff > 6days) or (same_employer == False) or (same_role == False) then reset the counter如果 (dt_diff > 6days) 或 (same_employer == False) 或 (same_role == False) 然后重置计数器
So for the above dataframe i would expect result as below:所以对于上面的 dataframe 我期望结果如下:
Worker Counter
1754 1
2951 3
3368 1
3539 3
You description is not highly explicit, but IIUC, you want the last continuity.你的描述不是很明确,但是 IIUC,你想要最后的连续性。
For this you can use boolean masks and groupby
.为此,您可以使用 boolean 掩码和
groupby
。 Use cummin
on the reversed boolean series to only keep the rows after the last False (add 1 to count it).在反转的 boolean 系列上使用
cummin
以仅保留最后一个 False 之后的行(加 1 进行计数)。
s = df['dt_diff'].lt('6d') & (df['same_employer'] | df['same_rosle'])
out = s.groupby(df['Worker']).apply(lambda x:x[::-1].cummin().sum()+1)
Output: Output:
Worker
1754 1
2951 3
3368 1
3539 3
dtype: int64
I expect your expected counter for the worker 3539
to be 1
because the last row should have reset it.我希望您对工人
3539
的预期计数器为1
,因为最后一行应该已将其重置。
Your condition:你的情况:
s = ~((df['dt_diff'].dt.days > 6) | (df['same_employer'] == False) | (df['same_role'] == False))
The key is to count from the last row up to the last row that does not satisfy your condition, and we can create a mask for that by:关键是从最后一行数到不满足条件的最后一行,我们可以通过以下方式为其创建掩码:
y = s[::-1].groupby(df['Worker']).cumprod()
then we sum over the mask, but adding 1 at last然后我们对掩码求和,但最后加 1
print(y.groupby(df['Worker']).sum()+1)
Worker
1754 1
2951 3
3368 1
3539 1
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.