简体   繁体   English

Pandas:具有基于条件的唯一值的新列

[英]Pandas: new column with unique values based on condition

I need to create a new "identifier column" with unique values for each combination of values of two columns.我需要为两列的每个值组合创建一个具有唯一值的新“标识符列”。 For example, the same "identifier" should be used when ID and phase are the same (eg r1 and ph1 [but a new, unique value should be added to the column when r1 and ph2])例如,当ID相位相同时,应该使用相同的“标识符”(例如 r1 和 ph1 [但是当 r1 和 ph2 时,应该将一个新的、唯一的值添加到列中])

df
ID   phase   side   values
r1   ph1     l      12
r1   ph1     r      34
r1   ph2     l      93
s4   ph3     l      21
s3   ph2     l      88
s3   ph2     r      54
...

I would need a new column (idx) like so:我需要一个新列(idx),如下所示:

new_df
ID   phase   side   values    idx
r1   ph1     l      12        1
r1   ph1     r      34        1
r1   ph2     l      93        2
s4   ph3     l      21        3
s3   ph2     l      88        4
s3   ph2     r      54        4
...

I've tried applying code from this question but could no achieve a way to increment the values in idx.我已经尝试应用这个问题的代码,但无法实现增加 idx 中的值的方法。

Any suggestion on how to accomplish this would be very welcome!任何关于如何实现这一点的建议都将非常受欢迎!

Try with groupby ngroup + 1, use sort=False to ensure groups are enumerated in the order they appear in the DataFrame:尝试使用groupby ngroup + 1,使用sort=False以确保按照它们在 DataFrame 中出现的顺序枚举组:

df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1

df : df

   ID phase side  values  idx
0  r1   ph1    l      12    1
1  r1   ph1    r      34    1
2  r1   ph2    l      93    2
3  s4   ph3    l      21    3
4  s3   ph2    l      88    4
5  s3   ph2    r      54    4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM