[英]How do i check that the unique values of a column exists in another column in dataframe?
I have a dataframe like this :我有一个这样的数据框:
A= [ ID COL1 COL2
23 AA BB
23 AA AA
23 AA DD
23 BB BB
23 BB AA
23 BB DD
23 CC BB
23 CC AA
24 AA BB ]
What i want to is to check that the unique value of col1 exist in Col2 for the same ID ,The ID is not always the same number.我想要的是检查 col1 的唯一值是否存在于 Col2 中,对于相同的 ID ,ID 并不总是相同的数字。 the check must be done only among rows with the same id i want a result like :检查必须仅在具有相同 ID 的行之间进行,我希望得到如下结果:
A= [ ID COL1 COL2 check
23 AA BB OK
23 AA AA OK
23 AA DD OK
23 BB BB OK
23 BB AA OK
23 BB DD OK
23 CC BB KO
23 CC AA KO
24 AA BB KO
]
i tried我试过
A['check'] = np.where(A.Col1.eq(A['Col2']).groupby(A['ID']).transform('any'), 'Anomalie', 'Valeur OK')
I'm not sur it s the right command ,can anyone help please ?我不知道这是正确的命令,有人可以帮忙吗?
You just want to check whether a cell value exists in a container: isin
is the way to go.您只想检查容器中是否存在单元格值: isin
是要走的路。 But as you want to process id by ID, you also need a groupby:但是当你想通过 ID 处理 id 时,你还需要一个 groupby:
df['check'] = df.groupby(['ID', 'COL1'], group_keys=False
).apply(lambda x: x['COL1'].isin(x['COL2']))
It gives as expected:它按预期提供:
ID COL1 COL2 check
0 23 AA BB True
1 23 AA AA True
2 23 AA DD True
3 23 BB BB True
4 23 BB AA True
5 23 BB DD True
6 23 CC BB False
7 23 CC AA False
8 24 AA BB False
If you want OK/KO strings instead of boolean values, just add:如果你想要 OK/KO 字符串而不是布尔值,只需添加:
df['check'] = np.where(df['check'], 'OK', 'KO')
您可以申请并检查该值是否在 Col2 中:
A['check'] = A[['ID', 'Col1'].apply(lambda row: 'OK' if row['Col1'] in A.loc[A['ID']==row['ID'], 'Col2'] else 'KO', axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.