Pandas: new column with unique values based on condition

Question

I need to create a new "identifier column" with unique values for each combination of values of two columns. For example, the same "identifier" should be used when ID and phase are the same (eg r1 and ph1 [but a new, unique value should be added to the column when r1 and ph2])

df
ID   phase   side   values
r1   ph1     l      12
r1   ph1     r      34
r1   ph2     l      93
s4   ph3     l      21
s3   ph2     l      88
s3   ph2     r      54
...

I would need a new column (idx) like so:

new_df
ID   phase   side   values    idx
r1   ph1     l      12        1
r1   ph1     r      34        1
r1   ph2     l      93        2
s4   ph3     l      21        3
s3   ph2     l      88        4
s3   ph2     r      54        4
...

I've tried applying code from this question but could no achieve a way to increment the values in idx.

Any suggestion on how to accomplish this would be very welcome!

Answer 1

Try with groupby ngroup + 1, use sort=False to ensure groups are enumerated in the order they appear in the DataFrame:

df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1

df :

   ID phase side  values  idx
0  r1   ph1    l      12    1
1  r1   ph1    r      34    1
2  r1   ph2    l      93    2
3  s4   ph3    l      21    3
4  s3   ph2    l      88    4
5  s3   ph2    r      54    4

Pandas: new column with unique values based on condition

Question

1 answers

solution1
3 ACCPTED 2021-06-07 14:44:18

Pandas: new column with unique values based on condition

Question

1 answers

solution1 3 ACCPTED 2021-06-07 14:44:18

solution1
3 ACCPTED 2021-06-07 14:44:18