简体   繁体   English

Pandas组连续和label长度

[英]Pandas group consecutive and label the length

I want get consecutive length labeled data我想获得连续长度标记的数据


a b 
---
1 1 
0 2 
1 3
1 2
0 1
1 3
1 1
1 3
0 3
1 2
1 1

I want:我想:

a b | c 
--------
1 1   1
0 2   0
1 3   2
1 2   2
0 1   0
1 3   3
1 1   3
1 3   3
0 2   0
1 2   2
1 1   2

then I can calculate the mean of "b" column by group "c".然后我可以按组“c”计算“b”列的平均值。 tried with shift and cumsum and cumcount all not work.尝试使用 shift 和 cumsum 和 cumcount 都不起作用。

Use GroupBy.transform by consecutive groups and then set 0 if not 1 in a column:按连续组使用GroupBy.transform ,然后a列中设置0如果不是1

df['c1'] = (df.groupby(df.a.ne(df.a.shift()).cumsum())['a']
              .transform('size')
              .where(df.a.eq(1), 0))
print (df)
    a  b  c  c1
0   1  1  1   1
1   0  2  0   0
2   1  3  2   2
3   1  2  2   2
4   0  1  0   0
5   1  3  3   3
6   1  1  3   3
7   1  3  3   3
8   0  2  0   0
9   1  2  2   2
10  1  1  2   2

If there are only 0, 1 values is possible multiple by a :如果只有0, 1值可能是a的倍数:

df['c1'] = (df.groupby(df.a.ne(df.a.shift()).cumsum())['a']
              .transform('size')
              .mul(df.a))
print (df)
    a  b  c  c1
0   1  1  1   1
1   0  2  0   0
2   1  3  2   2
3   1  2  2   2
4   0  1  0   0
5   1  3  3   3
6   1  1  3   3
7   1  3  3   3
8   0  2  0   0
9   1  2  2   2
10  1  1  2   2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM