I have two dataframes (df1, df2) and I would like to create a new column in df1 that indicates if there is a match,likely match or mismatch in the multiple columns between each dataframe. df1:
id a b c d name
a1 94 18 10 20 b1
a2 20 18 1 2 b4,b5
a3 21 18 34 32 b2,b3,b4
a4 216 5 56 76 b5
a5 210 5 10 30 b4,b5
df2:
id a b c d
b1 94 5 10 20
b2 A150 5 13 45
b3 167 5 4 -1
b4 210 5 40 80
b5 216 5 60 80
Basically name is id of df2. I would like to match name of df1 to id of df2 & bases of following condition create new column.
Match : df1['a','b','c','d'] = df2['a','b','c','d']
likely match : df1['a','b'] = df2['a','b'] & c or d +- 10 is fine
Missmatch: df1['a','b'] = df2['a','b'] but column c & d > +- 10
Missing: df1 record not in df2
Result
id a b c d name Status
a1 94 18 10 20 b1 Match
a2 20 18 1 2 b2,b3 Missing
a3 21 18 34 32 b2,b3,b4Missing
a4 210 5 10 30 b4,b5 Missmatch
a5 216 5 56 76 b5 Likely
You expected result is wrong. You flipped the column values of df1['id'] == 'a4'
and df1['id'] == 'a5'
your column names are different. Nevertheless, you can use np.select
df2['name'] = df1['name'].str.split(',')
conditions = [
((df2.apply(lambda x: x['id'] in x['name'], axis=1)) & (df1[['a','b','c','d']] == df2[['a','b','c','d']]).any(axis=1)),
((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) <= 10).any(axis=1)),
((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) >= 10).any(axis=1)),
]
choices = [
'Match',
'Likely',
'Missmatch',
]
df1['Status'] = np.select(conditions,choices,default='Missing')
Result:
id a b c d name Status
0 a1 94 18 10 20 b1 Match
1 a2 20 18 1 2 b4,b5 Missing
2 a3 21 18 34 32 b2,b3,b4 Missing
3 a4 216 5 56 76 b5 Likely
4 a5 210 5 10 30 b4,b5 Match
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.