简体   繁体   中英

Pandas Count Group Number

Given the following dataframe:

df=pd.DataFrame({'col1':['A','A','A','A','A','A','B','B','B','B','B','B'],
                'col2':['x','x','y','z','y','y','x','y','y','z','z','x'],
                })
df

    col1    col2
0   A       x
1   A       x
2   A       y
3   A       z
4   A       y
5   A       y
6   B       x
7   B       y
8   B       y
9   B       z
10  B       z
11  B       x

I'd like to create a new column, col3 which classifies the values in col2 sequentially, grouped by the values in col1 :

    col1    col2    col3
0   A       x       x1
1   A       x       x1
2   A       y       y1
3   A       z       z1
4   A       y       y2
5   A       y       y2
6   B       x       x1
7   B       y       y1
8   B       y       y1
9   B       z       z1
10  B       z       z1
11  B       x       x2

In the above example, col3[0:1] has a value of x1 because its the first group of x values in col2 for col1 = A . col3[4:5] has values of y2 because its the second group of y values in col2 for col1 = A etc...

I hope the description makes sense. I was unable to find an answer partially because I can't find an elegant way to articulate what I'm looking for.

Here's my approach:

groups = (df.assign(s=df.groupby('col1')['col2']   # group col2 by col1
                    .shift().ne(df['col2'])        # check if col2 different from the previous (shift)
                    .astype(int)                   # convert to int
                   )   # the new column s marks the beginning of consecutive blocks with `1`
          .groupby(['col1','col2'])['s']           # group `s` by `col1` and `col2`
          .cumsum()                                # cumsum by group
          .astype(str)
         )

df['col3'] = df['col2'] + groups

Output:

   col1 col2 col3
0     A    x   x1
1     A    x   x1
2     A    y   y1
3     A    z   z1
4     A    y   y2
5     A    y   y2
6     B    x   x1
7     B    y   y1
8     B    y   y1
9     B    z   z1
10    B    z   z1
11    B    x   x2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM