[英]Pandas Count Group Number
Given the following dataframe:给定以下数据框:
df=pd.DataFrame({'col1':['A','A','A','A','A','A','B','B','B','B','B','B'],
'col2':['x','x','y','z','y','y','x','y','y','z','z','x'],
})
df
col1 col2
0 A x
1 A x
2 A y
3 A z
4 A y
5 A y
6 B x
7 B y
8 B y
9 B z
10 B z
11 B x
I'd like to create a new column, col3
which classifies the values in col2
sequentially, grouped by the values in col1
:我想创建一个新列col3
,它按顺序对col2
的值进行分类,按col1
的值分组:
col1 col2 col3
0 A x x1
1 A x x1
2 A y y1
3 A z z1
4 A y y2
5 A y y2
6 B x x1
7 B y y1
8 B y y1
9 B z z1
10 B z z1
11 B x x2
In the above example, col3[0:1]
has a value of x1
because its the first group of x
values in col2
for col1 = A
.在上面的例子中, col3[0:1]
的值为x1
因为它是col2
中col1 = A
的第一组x
值。 col3[4:5]
has values of y2
because its the second group of y
values in col2
for col1 = A
etc... col3[4:5]
具有y2
值,因为它是col2
中col1 = A
等的第二组y
值...
I hope the description makes sense.我希望描述有意义。 I was unable to find an answer partially because I can't find an elegant way to articulate what I'm looking for.我无法部分地找到答案,因为我找不到一种优雅的方式来表达我正在寻找的东西。
Here's my approach:这是我的方法:
groups = (df.assign(s=df.groupby('col1')['col2'] # group col2 by col1
.shift().ne(df['col2']) # check if col2 different from the previous (shift)
.astype(int) # convert to int
) # the new column s marks the beginning of consecutive blocks with `1`
.groupby(['col1','col2'])['s'] # group `s` by `col1` and `col2`
.cumsum() # cumsum by group
.astype(str)
)
df['col3'] = df['col2'] + groups
Output:输出:
col1 col2 col3
0 A x x1
1 A x x1
2 A y y1
3 A z z1
4 A y y2
5 A y y2
6 B x x1
7 B y y1
8 B y y1
9 B z z1
10 B z z1
11 B x x2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.