简体   繁体   中英

Match string values in two different DataFrames Pandas

I have two dataframes (df1, df2) and I would like to create a new column in df1 that indicates if there is a match,likely match or mismatch in the multiple columns between each dataframe. df1:

id  a   b   c   d   name
a1  94  18  10  20  b1
a2  20  18  1   2   b4,b5
a3  21  18  34  32  b2,b3,b4
a4  216 5   56  76  b5
a5  210 5   10  30  b4,b5

df2:

id  a   b   c   d
b1  94  5   10  20
b2  A150    5   13  45
b3  167 5   4   -1
b4  210 5   40  80
b5  216 5   60  80

Basically name is id of df2. I would like to match name of df1 to id of df2 & bases of following condition create new column.

Match : df1['a','b','c','d'] = df2['a','b','c','d']  
likely match : df1['a','b'] = df2['a','b'] & c or d +- 10 is fine
Missmatch: df1['a','b'] = df2['a','b'] but column c & d > +- 10
Missing: df1 record not in df2

Result

id  a   b   c   d   name    Status
a1  94  18  10  20  b1      Match
a2  20  18  1   2   b2,b3   Missing
a3  21  18  34  32  b2,b3,b4Missing
a4  210 5   10  30  b4,b5   Missmatch
a5  216 5   56  76  b5      Likely

You expected result is wrong. You flipped the column values of df1['id'] == 'a4' and df1['id'] == 'a5' your column names are different. Nevertheless, you can use np.select

df2['name'] = df1['name'].str.split(',')

conditions = [
    ((df2.apply(lambda x: x['id'] in x['name'], axis=1)) & (df1[['a','b','c','d']] == df2[['a','b','c','d']]).any(axis=1)),
    ((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) <= 10).any(axis=1)),
    ((df1[['a','b']] == df2[['a','b']]).any(axis=1) & (abs(df2[['c','d']] - df1[['c','d']]) >= 10).any(axis=1)),
]

choices = [
    'Match',
    'Likely',
    'Missmatch',
]

df1['Status'] = np.select(conditions,choices,default='Missing')

Result:

    id  a   b   c   d   name      Status
0   a1  94  18  10  20  b1         Match
1   a2  20  18  1   2   b4,b5      Missing
2   a3  21  18  34  32  b2,b3,b4   Missing
3   a4  216 5   56  76  b5         Likely
4   a5  210 5   10  30  b4,b5      Match

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM