熊猫计数组号

Question

Given the following dataframe:给定以下数据框：

df=pd.DataFrame({'col1':['A','A','A','A','A','A','B','B','B','B','B','B'],
                'col2':['x','x','y','z','y','y','x','y','y','z','z','x'],
                })
df

    col1    col2
0   A       x
1   A       x
2   A       y
3   A       z
4   A       y
5   A       y
6   B       x
7   B       y
8   B       y
9   B       z
10  B       z
11  B       x

I'd like to create a new column, col3 which classifies the values in col2 sequentially, grouped by the values in col1 :我想创建一个新列col3 ，它按顺序对col2的值进行分类，按col1的值分组：

    col1    col2    col3
0   A       x       x1
1   A       x       x1
2   A       y       y1
3   A       z       z1
4   A       y       y2
5   A       y       y2
6   B       x       x1
7   B       y       y1
8   B       y       y1
9   B       z       z1
10  B       z       z1
11  B       x       x2

In the above example, col3[0:1] has a value of x1 because its the first group of x values in col2 for col1 = A .在上面的例子中， col3[0:1]的值为x1因为它是col2中col1 = A的第一组x值。 col3[4:5] has values of y2 because its the second group of y values in col2 for col1 = A etc... col3[4:5]具有y2值，因为它是col2中col1 = A等的第二组y值...

I hope the description makes sense.我希望描述有意义。 I was unable to find an answer partially because I can't find an elegant way to articulate what I'm looking for.我无法部分地找到答案，因为我找不到一种优雅的方式来表达我正在寻找的东西。

Answer 1

Here's my approach:这是我的方法：

groups = (df.assign(s=df.groupby('col1')['col2']   # group col2 by col1
                    .shift().ne(df['col2'])        # check if col2 different from the previous (shift)
                    .astype(int)                   # convert to int
                   )   # the new column s marks the beginning of consecutive blocks with `1`
          .groupby(['col1','col2'])['s']           # group `s` by `col1` and `col2`
          .cumsum()                                # cumsum by group
          .astype(str)
         )

df['col3'] = df['col2'] + groups

Output:输出：

   col1 col2 col3
0     A    x   x1
1     A    x   x1
2     A    y   y1
3     A    z   z1
4     A    y   y2
5     A    y   y2
6     B    x   x1
7     B    y   y1
8     B    y   y1
9     B    z   z1
10    B    z   z1
11    B    x   x2

熊猫计数组号

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-20 04:53:26

熊猫计数组号

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-20 04:53:26

解决方案1
1 已采纳 2020-02-20 04:53:26