简体   繁体   English

Pandas DataFrame中的编号组

[英]Numbering Groups In Pandas DataFrame

Is there a way in Pandas to number groups in a DataFrame, based on column values? 在Pandas中,是否有一种方法可以基于列值对DataFrame中的组进行编号? If my frame looks like this 如果我的框架看起来像这样

  Column1 Column2  Column3
0       A       X       23
1       A       X       45
2       A       Y       32
3       A       Y       53
4       A       Y       67
5       B       X       85
6       B       Y       12
7       B       Y       94

What I'd like to be able to do is something like 我想做的是

df.group_numbers(['Column1', 'Column2'])

  Column1 Column2  Column3  GroupNumber
0       A       X       23            1
1       A       X       45            1
2       A       Y       32            2
3       A       Y       53            2
4       A       Y       67            2
5       B       X       85            3    
6       B       Y       12            4
7       B       Y       94            4

As suggested in ajcr 's comment, pd.factorize is the way to go. 正如ajcr的评论中所建议的ajcrpd.factorize是必经之路。 In your case you can add the two columns to quickly create an array of keys by adding the two columns with some delimiter between. 在您的情况下,您可以添加两列以通过添加两列之间带有一些定界符来快速创建键数组。 The delimiter is to avoid confusing pairs such as ab, c and a, bc as suggested by DSM . 分隔符是为了避免混淆DSM建议的对,例如ab, ca, bc

df['GroupNumber'] = pd.factorize(df.Column1 + ' ' + df.Column2) 

It's still faster than using pd.lib.fast_zip . 它仍然比使用pd.lib.fast_zip更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM