Pandas 为组创建计数器列但根据多个条件重置计数

Question

I have the following Dataframe:我有以下 Dataframe：

Worker  dt_diff          same_employer  same_role
1754    0 days 00:00:00  False          False
2951    0 days 00:00:00  False          False
2951    1 days 00:00:00  True           True
2951    1 days 01:00:00  True           True
3368    0 days 00:00:00  False          False
3368    7 days 00:00:00  True           True
3368    7 days 00:00:00  True           True
3368    7 days 00:00:00  True           True
3368    7 days 00:00:00  True           True
3368    7 days 00:00:00  True           True
3539    0 days 00:00:00  False          False
3539    1 days 00:00:00  True           True
3539    1 days 00:00:00  True           True
3539    3 days 00:30:00  False          False
3539    1 days 00:00:00  True           True
3539    2 days 06:00:00  False          True

I would like to create a new column containing continuity counter grouped by worker.我想创建一个包含按工作人员分组的连续性计数器的新列。 However the counter will be based on the following conditions:但是，计数器将基于以下条件：

if (dt_diff > 6days) or (same_employer == False) or (same_role == False) then reset the counter如果 (dt_diff > 6days) 或 (same_employer == False) 或 (same_role == False) 然后重置计数器

So for the above dataframe i would expect result as below:所以对于上面的 dataframe 我期望结果如下：

Worker  Counter
1754    1
2951    3
3368    1
3539    3

Answer 1

You description is not highly explicit, but IIUC, you want the last continuity.你的描述不是很明确，但是 IIUC，你想要最后的连续性。

For this you can use boolean masks and groupby .为此，您可以使用 boolean 掩码和groupby 。 Use cummin on the reversed boolean series to only keep the rows after the last False (add 1 to count it).在反转的 boolean 系列上使用cummin以仅保留最后一个 False 之后的行（加 1 进行计数）。

s = df['dt_diff'].lt('6d') & (df['same_employer'] | df['same_rosle'])

out = s.groupby(df['Worker']).apply(lambda x:x[::-1].cummin().sum()+1)

Output: Output：

Worker
1754    1
2951    3
3368    1
3539    3
dtype: int64

Answer 2

I expect your expected counter for the worker 3539 to be 1 because the last row should have reset it.我希望您对工人3539的预期计数器为1 ，因为最后一行应该已将其重置。

Your condition:你的情况：

s =  ~((df['dt_diff'].dt.days > 6) | (df['same_employer'] == False) | (df['same_role'] == False))

The key is to count from the last row up to the last row that does not satisfy your condition, and we can create a mask for that by:关键是从最后一行数到不满足条件的最后一行，我们可以通过以下方式为其创建掩码：

y = s[::-1].groupby(df['Worker']).cumprod()

then we sum over the mask, but adding 1 at last然后我们对掩码求和，但最后加 1

print(y.groupby(df['Worker']).sum()+1)

Worker
1754    1
2951    3
3368    1
3539    1
dtype: int64

Pandas 为组创建计数器列但根据多个条件重置计数

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-02-26 14:47:07

解决方案2
0 2022-02-26 15:20:01

Pandas 为组创建计数器列但根据多个条件重置计数

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-02-26 14:47:07

解决方案2 0 2022-02-26 15:20:01

解决方案1
1 已采纳 2022-02-26 14:47:07

解决方案2
0 2022-02-26 15:20:01