Is there a way in Pandas to number groups in a DataFrame, based on column values? If my frame looks like this
Column1 Column2 Column3
0 A X 23
1 A X 45
2 A Y 32
3 A Y 53
4 A Y 67
5 B X 85
6 B Y 12
7 B Y 94
What I'd like to be able to do is something like
df.group_numbers(['Column1', 'Column2'])
Column1 Column2 Column3 GroupNumber
0 A X 23 1
1 A X 45 1
2 A Y 32 2
3 A Y 53 2
4 A Y 67 2
5 B X 85 3
6 B Y 12 4
7 B Y 94 4
As suggested in ajcr
's comment, pd.factorize
is the way to go. In your case you can add the two columns to quickly create an array of keys by adding the two columns with some delimiter between. The delimiter is to avoid confusing pairs such as ab, c
and a, bc
as suggested by DSM
.
df['GroupNumber'] = pd.factorize(df.Column1 + ' ' + df.Column2)
It's still faster than using pd.lib.fast_zip
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.