简体   繁体   English

熊猫计数组号

[英]Pandas Count Group Number

Given the following dataframe:给定以下数据框:

df=pd.DataFrame({'col1':['A','A','A','A','A','A','B','B','B','B','B','B'],
                'col2':['x','x','y','z','y','y','x','y','y','z','z','x'],
                })
df

    col1    col2
0   A       x
1   A       x
2   A       y
3   A       z
4   A       y
5   A       y
6   B       x
7   B       y
8   B       y
9   B       z
10  B       z
11  B       x

I'd like to create a new column, col3 which classifies the values in col2 sequentially, grouped by the values in col1 :我想创建一个新列col3 ,它按顺序对col2的值进行分类,按col1的值分组:

    col1    col2    col3
0   A       x       x1
1   A       x       x1
2   A       y       y1
3   A       z       z1
4   A       y       y2
5   A       y       y2
6   B       x       x1
7   B       y       y1
8   B       y       y1
9   B       z       z1
10  B       z       z1
11  B       x       x2

In the above example, col3[0:1] has a value of x1 because its the first group of x values in col2 for col1 = A .在上面的例子中, col3[0:1]的值为x1因为它是col2col1 = A的第一组x值。 col3[4:5] has values of y2 because its the second group of y values in col2 for col1 = A etc... col3[4:5]具有y2值,因为它是col2col1 = A等的第二组y值...

I hope the description makes sense.我希望描述有意义。 I was unable to find an answer partially because I can't find an elegant way to articulate what I'm looking for.我无法部分地找到答案,因为我找不到一种优雅的方式来表达我正在寻找的东西。

Here's my approach:这是我的方法:

groups = (df.assign(s=df.groupby('col1')['col2']   # group col2 by col1
                    .shift().ne(df['col2'])        # check if col2 different from the previous (shift)
                    .astype(int)                   # convert to int
                   )   # the new column s marks the beginning of consecutive blocks with `1`
          .groupby(['col1','col2'])['s']           # group `s` by `col1` and `col2`
          .cumsum()                                # cumsum by group
          .astype(str)
         )

df['col3'] = df['col2'] + groups

Output:输出:

   col1 col2 col3
0     A    x   x1
1     A    x   x1
2     A    y   y1
3     A    z   z1
4     A    y   y2
5     A    y   y2
6     B    x   x1
7     B    y   y1
8     B    y   y1
9     B    z   z1
10    B    z   z1
11    B    x   x2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM