I have a dataframe like this :
A= [ ID COL1 COL2
23 AA BB
23 AA AA
23 AA DD
23 BB BB
23 BB AA
23 BB DD
23 CC BB
23 CC AA
24 AA BB ]
What i want to is to check that the unique value of col1 exist in Col2 for the same ID ,The ID is not always the same number. the check must be done only among rows with the same id i want a result like :
A= [ ID COL1 COL2 check
23 AA BB OK
23 AA AA OK
23 AA DD OK
23 BB BB OK
23 BB AA OK
23 BB DD OK
23 CC BB KO
23 CC AA KO
24 AA BB KO
]
i tried
A['check'] = np.where(A.Col1.eq(A['Col2']).groupby(A['ID']).transform('any'), 'Anomalie', 'Valeur OK')
I'm not sur it s the right command ,can anyone help please ?
You just want to check whether a cell value exists in a container: isin
is the way to go. But as you want to process id by ID, you also need a groupby:
df['check'] = df.groupby(['ID', 'COL1'], group_keys=False
).apply(lambda x: x['COL1'].isin(x['COL2']))
It gives as expected:
ID COL1 COL2 check
0 23 AA BB True
1 23 AA AA True
2 23 AA DD True
3 23 BB BB True
4 23 BB AA True
5 23 BB DD True
6 23 CC BB False
7 23 CC AA False
8 24 AA BB False
If you want OK/KO strings instead of boolean values, just add:
df['check'] = np.where(df['check'], 'OK', 'KO')
您可以申请并检查该值是否在 Col2 中:
A['check'] = A[['ID', 'Col1'].apply(lambda row: 'OK' if row['Col1'] in A.loc[A['ID']==row['ID'], 'Col2'] else 'KO', axis=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.