[英]Pandas: new column with unique values based on condition
I need to create a new "identifier column" with unique values for each combination of values of two columns.我需要为两列的每个值组合创建一个具有唯一值的新“标识符列”。 For example, the same "identifier" should be used when ID and phase are the same (eg r1 and ph1 [but a new, unique value should be added to the column when r1 and ph2])例如,当ID和相位相同时,应该使用相同的“标识符”(例如 r1 和 ph1 [但是当 r1 和 ph2 时,应该将一个新的、唯一的值添加到列中])
df
ID phase side values
r1 ph1 l 12
r1 ph1 r 34
r1 ph2 l 93
s4 ph3 l 21
s3 ph2 l 88
s3 ph2 r 54
...
I would need a new column (idx) like so:我需要一个新列(idx),如下所示:
new_df
ID phase side values idx
r1 ph1 l 12 1
r1 ph1 r 34 1
r1 ph2 l 93 2
s4 ph3 l 21 3
s3 ph2 l 88 4
s3 ph2 r 54 4
...
I've tried applying code from this question but could no achieve a way to increment the values in idx.我已经尝试应用这个问题的代码,但无法实现增加 idx 中的值的方法。
Any suggestion on how to accomplish this would be very welcome!任何关于如何实现这一点的建议都将非常受欢迎!
Try with groupby ngroup
+ 1, use sort=False
to ensure groups are enumerated in the order they appear in the DataFrame:尝试使用groupby ngroup
+ 1,使用sort=False
以确保按照它们在 DataFrame 中出现的顺序枚举组:
df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1
df
: df
:
ID phase side values idx
0 r1 ph1 l 12 1
1 r1 ph1 r 34 1
2 r1 ph2 l 93 2
3 s4 ph3 l 21 3
4 s3 ph2 l 88 4
5 s3 ph2 r 54 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.