[英]Numbering Groups In Pandas DataFrame
Is there a way in Pandas to number groups in a DataFrame, based on column values? 在Pandas中,是否有一种方法可以基于列值对DataFrame中的组进行编号? If my frame looks like this 如果我的框架看起来像这样
Column1 Column2 Column3
0 A X 23
1 A X 45
2 A Y 32
3 A Y 53
4 A Y 67
5 B X 85
6 B Y 12
7 B Y 94
What I'd like to be able to do is something like 我想做的是
df.group_numbers(['Column1', 'Column2'])
Column1 Column2 Column3 GroupNumber
0 A X 23 1
1 A X 45 1
2 A Y 32 2
3 A Y 53 2
4 A Y 67 2
5 B X 85 3
6 B Y 12 4
7 B Y 94 4
As suggested in ajcr
's comment, pd.factorize
is the way to go. 正如ajcr
的评论中所建议的ajcr
, pd.factorize
是必经之路。 In your case you can add the two columns to quickly create an array of keys by adding the two columns with some delimiter between. 在您的情况下,您可以添加两列以通过添加两列之间带有一些定界符来快速创建键数组。 The delimiter is to avoid confusing pairs such as ab, c
and a, bc
as suggested by DSM
. 分隔符是为了避免混淆DSM
建议的对,例如ab, c
和a, bc
。
df['GroupNumber'] = pd.factorize(df.Column1 + ' ' + df.Column2)
It's still faster than using pd.lib.fast_zip
. 它仍然比使用pd.lib.fast_zip
更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.