简体   繁体   中英

Pandas: new column with unique values based on condition

I need to create a new "identifier column" with unique values for each combination of values of two columns. For example, the same "identifier" should be used when ID and phase are the same (eg r1 and ph1 [but a new, unique value should be added to the column when r1 and ph2])

df
ID   phase   side   values
r1   ph1     l      12
r1   ph1     r      34
r1   ph2     l      93
s4   ph3     l      21
s3   ph2     l      88
s3   ph2     r      54
...

I would need a new column (idx) like so:

new_df
ID   phase   side   values    idx
r1   ph1     l      12        1
r1   ph1     r      34        1
r1   ph2     l      93        2
s4   ph3     l      21        3
s3   ph2     l      88        4
s3   ph2     r      54        4
...

I've tried applying code from this question but could no achieve a way to increment the values in idx.

Any suggestion on how to accomplish this would be very welcome!

Try with groupby ngroup + 1, use sort=False to ensure groups are enumerated in the order they appear in the DataFrame:

df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1

df :

   ID phase side  values  idx
0  r1   ph1    l      12    1
1  r1   ph1    r      34    1
2  r1   ph2    l      93    2
3  s4   ph3    l      21    3
4  s3   ph2    l      88    4
5  s3   ph2    r      54    4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM