I see the solution in R but not in python. If the question is duplicate, please point me to the previous asked question/solution.
I have a dataframe as following.
df = pd.DataFrame({'col1': ['a','b','c','c','d','e','a','h','i','a'],'col2':['3:00','3:00','4:00','4:00','3:00','5:00','5:00','3:00','3:00','2:00']})
df
Out[83]:
col1 col2
0 a 3:00
1 b 3:00
2 c 4:00
3 c 4:00
4 d 3:00
5 e 5:00
6 a 5:00
7 h 3:00
8 i 3:00
9 a 2:00
What I'd like to do is groupby 'col1' and assign a unique ID to different values in col2 as following:
col1 col2 ID
a 2:00 0
a 3:00 1
a 5:00 2
b 3:00 0
c 4:00 0
c 4:00 0
...
I tried to use pd.Categorical but can't quite get to where I wanted to be. Would appreciate any help. Thanks.
we can use pd.factorize() method:
In [170]: df['ID'] = df.groupby('col1')['col2'].transform(lambda x: pd.factorize(x)[0])
In [171]: df
Out[171]:
col1 col2 ID
0 a 3:00 0
1 b 3:00 0
2 c 4:00 0
3 c 4:00 0
4 d 3:00 0
5 e 5:00 0
6 a 5:00 1
7 h 3:00 0
8 i 3:00 0
9 a 2:00 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.