简体   繁体   中英

How do i check that the unique values of a column exists in another column in dataframe?

I have a dataframe like this :


A= [ ID COL1 COL2  
     23  AA   BB    
     23  AA   AA   
     23  AA   DD   
     23  BB   BB 
     23  BB   AA
     23  BB   DD
     23  CC   BB
     23  CC   AA
     24  AA   BB  ]


What i want to is to check that the unique value of col1 exist in Col2 for the same ID ,The ID is not always the same number. the check must be done only among rows with the same id i want a result like :


A= [ ID COL1 COL2  check 
     23  AA   BB    OK 
     23  AA   AA    OK 
     23  AA   DD    OK
     23  BB   BB    OK 
     23  BB   AA    OK 
     23  BB   DD    OK 
     23  CC   BB    KO
     23  CC   AA    KO 
     24  AA   BB    KO 
]

i tried

 A['check'] = np.where(A.Col1.eq(A['Col2']).groupby(A['ID']).transform('any'), 'Anomalie', 'Valeur OK')

I'm not sur it s the right command ,can anyone help please ?

You just want to check whether a cell value exists in a container: isin is the way to go. But as you want to process id by ID, you also need a groupby:

df['check'] = df.groupby(['ID', 'COL1'], group_keys=False
                         ).apply(lambda x: x['COL1'].isin(x['COL2']))

It gives as expected:

   ID COL1 COL2  check
0  23   AA   BB   True
1  23   AA   AA   True
2  23   AA   DD   True
3  23   BB   BB   True
4  23   BB   AA   True
5  23   BB   DD   True
6  23   CC   BB  False
7  23   CC   AA  False
8  24   AA   BB  False

If you want OK/KO strings instead of boolean values, just add:

df['check'] = np.where(df['check'], 'OK', 'KO')

您可以申请并检查该值是否在 Col2 中:

A['check'] = A[['ID', 'Col1'].apply(lambda row: 'OK' if row['Col1'] in A.loc[A['ID']==row['ID'], 'Col2'] else 'KO', axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM