[英]groupby streak of numbers in one column of pandas dataframe
This is my dataframe:这是我的 dataframe:
import pandas as pd
df = pd.DataFrame(
{
'a': [0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0],
'b': [0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0]
}
)
And this is the way that I want to group it:这就是我想要分组的方式:
2 1 1
3 0 1
4 0 1
5 0 1
6 0 0
7 0 0
9 1 0
10 0 1
13 1 1
14 0 1
15 0 1
16 0 0
17 0 1
I want to group this dataframe based on values of column b.我想根据 b 列的值对这个 dataframe 进行分组。 The first thing to do is to find the 1s in column a.首先要做的是在 a 列中找到 1。 And then I want to continue as long as there is 0 in column b and after that get the row after that 0 as well.然后我想继续,只要 b 列中有 0,然后也得到 0 之后的行。 If the value in a is 1 and the value in b is 0 I want to continue only for one row.如果 a 中的值为 1 而 b 中的值为 0 我只想继续一行。 Basically I want to stop as soon as there is a 0 in column b and then go on one row after that 0.基本上我想在 b 列中有 0 时立即停止,然后在 0 之后的一行中停止 go。
I have tried these two posts: post1 , post2 but still have problem to solve this.我已经尝试过这两个帖子: post1 , post2但仍然有问题要解决这个问题。
I have tried to group them by: df.b.diff().cumsum()
but it doesn't give me what I want我试图将它们分组: df.b.diff().cumsum()
但它没有给我我想要的
Use cumsum
to create a helper Series for filtering/grouping, then subfilter each group with a boolean mask:使用cumsum
创建一个用于过滤/分组的辅助系列,然后使用 boolean 掩码对每个组进行子过滤:
group = df['a'].cumsum()
for k, g in df[group>0].groupby(group):
# drop rows 2 places after the first 0
m = g['b'].ne(0).cummin().shift(2, fill_value=True)
print(g[m])
Output: Output:
a b
2 1 1
3 0 1
4 0 1
5 0 1
6 0 0
7 0 0
a b
9 1 0
10 0 1
a b
13 1 1
14 0 1
15 0 1
16 0 0
17 0 1
Either run the above and concat
, or:运行上述和concat
,或者:
group = df['a'].cumsum()
m = df['b'].ne(0).groupby(group).apply(lambda x: x.cummin().shift(2, fill_value=True))
out = df[group.gt(0)&m]
Output: Output:
a b
2 1 1
3 0 1
4 0 1
5 0 1
6 0 0
7 0 0
9 1 0
10 0 1
13 1 1
14 0 1
15 0 1
16 0 0
17 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.