Pandas DataFrame中的编号组

Question

Is there a way in Pandas to number groups in a DataFrame, based on column values? 在Pandas中，是否有一种方法可以基于列值对DataFrame中的组进行编号？ If my frame looks like this 如果我的框架看起来像这样

  Column1 Column2  Column3
0       A       X       23
1       A       X       45
2       A       Y       32
3       A       Y       53
4       A       Y       67
5       B       X       85
6       B       Y       12
7       B       Y       94

What I'd like to be able to do is something like 我想做的是

df.group_numbers(['Column1', 'Column2'])

  Column1 Column2  Column3  GroupNumber
0       A       X       23            1
1       A       X       45            1
2       A       Y       32            2
3       A       Y       53            2
4       A       Y       67            2
5       B       X       85            3    
6       B       Y       12            4
7       B       Y       94            4

Answer 1

As suggested in ajcr 's comment, pd.factorize is the way to go. 正如ajcr的评论中所建议的ajcr ， pd.factorize是必经之路。 In your case you can add the two columns to quickly create an array of keys by adding the two columns with some delimiter between. 在您的情况下，您可以添加两列以通过添加两列之间带有一些定界符来快速创建键数组。 The delimiter is to avoid confusing pairs such as ab, c and a, bc as suggested by DSM . 分隔符是为了避免混淆DSM建议的对，例如ab, c和a, bc 。

df['GroupNumber'] = pd.factorize(df.Column1 + ' ' + df.Column2)

It's still faster than using pd.lib.fast_zip . 它仍然比使用pd.lib.fast_zip更快。

Pandas DataFrame中的编号组

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-10-30 20:06:32

Pandas DataFrame中的编号组

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-10-30 20:06:32

解决方案1
1 已采纳 2015-10-30 20:06:32