简体   繁体   中英

How to assign a group based on consecutive sum in pandas

I have a dataframe as below:

A         B          code cumul_sum
group1    group1_1    A     1
group1    group1_1    A     2
group1    group1_1    B     1
group1    group1_1    A     1
group1    group1_1    A     2
group1    group1_1    A     3 
group2    group2_1    A     1
group2    group2_1    A     2
group2    group2_1    A     3 

Please assume column A and B are belonging to same category. I wanted to group and the expected result are as follows

A         B          code cumul_sum   **groupcat**
group1    group1_1    A     1          group1
group1    group1_1    A     2          group1
group1    group1_1    B     1          group2
group1    group1_1    A     1          group3
group1    group1_1    A     2          group3
group1    group1_1    A     3          group3
group2    group2_1    A     1          group1
group2    group2_1    A     2          group1
group2    group2_1    A     3          group1

Since there is a code B that occurred in the middle, group1 is not assigned to the third row and then the next value is a fresh categorical value.

EDIT: To also reset the group-cat for group A and B.

Please advice

You can modify previous solution :

#tested consecutive values by code column
s = df['code'].ne(df['code'].shift()).cumsum()
df['groupcat'] = 'group' + s.astype(str)

print (df)
        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3

#tested consecutive groups by A, B, code
s = df[['A','B','code']].ne(df[['A','B','code']].shift()).any(axis=1).cumsum()
df['groupcat'] = 'group' + s.astype(str)

print (df)
        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3

EDIT:

s = df['code'].ne(df['code'].shift()).cumsum()


df['group'] = ('group' + df.assign(s=s)
                           .groupby(['A','B'])['s']
                           .rank('dense').astype(int).astype(str))
print (df)
        A         B code  cumul_sum   group
0  group1  group1_1    A          1  group1
1  group1  group1_1    A          2  group1
2  group1  group1_1    B          1  group2
3  group1  group1_1    A          1  group3
4  group1  group1_1    A          2  group3
5  group1  group1_1    A          3  group3
6  group2  group2_1    A          1  group1
7  group2  group2_1    A          2  group1
8  group2  group2_1    A          3  group1

Alternative solution:

s = (df[['A','B','code']].ne(df[['A','B','code']].shift()).any(axis=1)
        .groupby([df.A, df.B]).cumsum())
df['groupcat'] = 'group' + s.astype(str)

print (df)
        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3
6  group2  group2_1    A          1   group1
7  group2  group2_1    A          2   group1
8  group2  group2_1    A          3   group1

Since your groups ignore the code, you can just restart on each 1 value:

df['groupcat'] = df['cumul_sum'].eq(1).cumsum()

As string:

df['groupcat'] = df['cumul_sum'].eq(1).cumsum().astype(str).radd('group')

output:

        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM