简体   繁体   English

Pandas - 更改少于 n 个后续值相等的行

[英]Pandas - changing rows where less than n subsequent values are equal

I have the following dataframe:我有以下数据框:

df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,1,1,0,0,1,1,1,0,1,1,1,1,0,0,0]})

Now I would like to set all the rows equal to zero where less than four 1 's appear "in a row", ie I would like to have the following resulting DataFrame:现在我想将所有行设置为零,其中少于四个1出现“连续”,即我想要以下结果数据帧:

df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0]})

I was not able to find a way to achieve this nicely...我无法找到一种方法来很好地实现这一目标......

Try with groupby and where :尝试使用groupbywhere

streaks = df.groupby(df["col"].ne(df["col"].shift()).cumsum()).transform("sum")
output = df.where(streaks.ge(4), 0)

>>> output
    col
0     0
1     0
2     1
3     1
4     1
5     1
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
15    0
16    1
17    1
18    1
19    1
20    0
21    0
22    0

We can do我们可以做的

df.loc[df.groupby(df.col.eq(0).cumsum()).transform('count')['col']<5,'col'] = 0
df
Out[77]: 
    col
0     0
1     0
2     1
3     1
4     1
5     1
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
15    0
16    1
17    1
18    1
19    1
20    0
21    0
22    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM