简体   繁体   English

Pandas 为组创建计数器列但根据多个条件重置计数

[英]Pandas create counter column for group but reset count based on multiple conditions

I have the following Dataframe:我有以下 Dataframe:

Worker  dt_diff          same_employer  same_role
1754    0 days 00:00:00  False          False
2951    0 days 00:00:00  False          False
2951    1 days 00:00:00  True           True
2951    1 days 01:00:00  True           True
3368    0 days 00:00:00  False          False
3368    7 days 00:00:00  True           True
3368    7 days 00:00:00  True           True
3368    7 days 00:00:00  True           True
3368    7 days 00:00:00  True           True
3368    7 days 00:00:00  True           True
3539    0 days 00:00:00  False          False
3539    1 days 00:00:00  True           True
3539    1 days 00:00:00  True           True
3539    3 days 00:30:00  False          False
3539    1 days 00:00:00  True           True
3539    2 days 06:00:00  False          True

I would like to create a new column containing continuity counter grouped by worker.我想创建一个包含按工作人员分组的连续性计数器的新列。 However the counter will be based on the following conditions:但是,计数器将基于以下条件:

if (dt_diff > 6days) or (same_employer == False) or (same_role == False) then reset the counter如果 (dt_diff > 6days) 或 (same_employer == False) 或 (same_role == False) 然后重置计数器

So for the above dataframe i would expect result as below:所以对于上面的 dataframe 我期望结果如下:

Worker  Counter
1754    1
2951    3
3368    1
3539    3

You description is not highly explicit, but IIUC, you want the last continuity.你的描述不是很明确,但是 IIUC,你想要最后的连续性。

For this you can use boolean masks and groupby .为此,您可以使用 boolean 掩码和groupby Use cummin on the reversed boolean series to only keep the rows after the last False (add 1 to count it).在反转的 boolean 系列上使用cummin以仅保留最后一个 False 之后的行(加 1 进行计数)。

s = df['dt_diff'].lt('6d') & (df['same_employer'] | df['same_rosle'])

out = s.groupby(df['Worker']).apply(lambda x:x[::-1].cummin().sum()+1)

Output: Output:

Worker
1754    1
2951    3
3368    1
3539    3
dtype: int64

I expect your expected counter for the worker 3539 to be 1 because the last row should have reset it.我希望您对工人3539的预期计数器为1 ,因为最后一行应该已将其重置。

Your condition:你的情况:

s =  ~((df['dt_diff'].dt.days > 6) | (df['same_employer'] == False) | (df['same_role'] == False))

The key is to count from the last row up to the last row that does not satisfy your condition, and we can create a mask for that by:关键是从最后一行数到不满足条件的最后一行,我们可以通过以下方式为其创建掩码:

y = s[::-1].groupby(df['Worker']).cumprod()

then we sum over the mask, but adding 1 at last然后我们对掩码求和,但最后加 1

print(y.groupby(df['Worker']).sum()+1)

Worker
1754    1
2951    3
3368    1
3539    1
dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 pandas 根据多列条件计算行数? - How to count rows based on multiple column conditions using pandas? "如何根据多个条件估计 Pandas 数据框列值的计数?" - How to estimate count for Pandas dataframe column values based on multiple conditions? Pandas DataFrame 根据多个条件分组添加新列值 - Pandas DataFrame add new column values based on group by multiple conditions 如何按多列分组并根据Python中的条件创建新列? - How to group by multiple columns and create a new column based on conditions in Python? 熊猫-如何根据多列的条件创建具有3个输出的列 - Pandas - How to create a column with 3 outputs based on conditions on multiple columns Pandas:根据多列条件新建列 - Pandas: Create New Column Based on Conditions of Multiple Columns 如何根据多个条件在 pandas df 中创建一个新列? - How to create a new column in a pandas df based on multiple conditions? 基于多个条件在 Pandas 数据框中创建一个新列 - Create a new column in pandas dataframe based on multiple conditions 如何使用基于 2 列的多个条件在 pandas 中创建新列? - How to use multiple conditions based on 2 columns to create the new column in pandas? 熊猫:基于另一列的增加或重置计数 - Pandas: Increment or reset count based on another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM