将组分配给 Pandas 列中的连续 1

Question

I have a column in pandas having values 0 and 1. I want to assign group number where there are more than 9 consecutive 1.我在 Pandas 中有一个值为 0 和 1 的列。我想在连续 1 超过 9 个的地方分配组号。

Example: Say my column values are: [1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1]示例：假设我的列值为： [1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1]

I want a new column or change the same column to: [0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,2,2,2,2,2,2,2,2,2,2,2,2,2,0,0,0,3,3,3,3,3,3,3,3,3,3,3]我想要一个新列或将同一列更改为： [0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,2,2,2,2,2,2,2,2,2,2,2,2,2,0,0,0,3,3,3,3,3,3,3,3,3,3,3]

I got upto a point where I can replace all the consecutive 1s (count greater than 9) by another number say 2. Here is the code:我已经到了可以用另一个数字替换所有连续 1（计数大于 9）的地步，比如 2。这是代码：

def f(col, threshold=9):
    mask = col.groupby((col != col.shift()).cumsum()).transform('count').gt(threshold)
    mask &= col.eq(1)
    #print (mask)
    col.update(col.loc[mask].replace(1,2))
    return col

Answer 1

Find consecutive groups of 1s and determine the size of those groups.找到连续的 1 组并确定这些组的大小。 Use where to mask any groups of 0s, or groups of 1s that are too small, then ngroup will allow you to label them properly.使用where来屏蔽任何太小的 0 组或 1 组，然后ngroup将允许您正确标记它们。 NaN rows get labeled -1 and you want the counting to start at 1, so adding 1 fixes both of these simultaneously. NaN行被标记为 -1 并且您希望计数从 1 开始，因此添加1可以同时修复这两个值。

import pandas as pd
s = pd.Series([1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,
               1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,
               1,0,0,0,1,1,1,1,1,1,1,1,1,1,1])

u = s.ne(s.shift()).cumsum().where(s.eq(1))  # Label consecutive groups of 1s, NaN 0s
u = u.groupby(u).transform('size').gt(9)     # True only if 1s and size > 9.

# Any smaller groups or 0s get NaN'd by `where` which are labeled -1 by `ngroup`
result = u.groupby(u.ne(u.shift()).cumsum().where(u)).ngroup()+1

print(results.tolist())
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 
 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
 2, 2, 2, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]

Answer 2

My approach:我的做法：

s = pd.Series([1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,
               1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,
               1,0,0,0,1,1,1,1,1,1,1,1,1,1,1])

# groupby and filter those with >=9 ones
u = s.groupby(s.ne(1).cumsum()).transform('sum').ge(9) & s

# count the groups of True:
(~u.shift(fill_value=False) & u).cumsum().mul(u)

Output:输出：

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 2 2 2
 2 2 2 2 2 2 2 2 2 2 0 0 0 3 3 3 3 3 3 3 3 3 3 3]

将组分配给 Pandas 列中的连续 1

问题描述

2 个解决方案

解决方案1
4 已采纳 2019-12-09 20:24:21

解决方案2
2 2019-12-09 20:48:36

将组分配给 Pandas 列中的连续 1

问题描述

2 个解决方案

解决方案1 4 已采纳 2019-12-09 20:24:21

解决方案2 2 2019-12-09 20:48:36

解决方案1
4 已采纳 2019-12-09 20:24:21

解决方案2
2 2019-12-09 20:48:36