简体   繁体   中英

How can I match values from different dataframes based on some conditions or function using pandas?

Suppose I have two dataframes as below.

raw_data = {
    'name': ['Jason love you', 'Molly hope wish care', 'happy birthday', 'dog cat', 'tiger legend bird'],
    'nationality': ['USA', 'USA', 'France', 'UK', 'UK']
}

raw_data_2 = {
    'name_2': ['Jason you', 'Molly care wist', 'hapy birthday', 'dog', 'tiger bird'],
    'nationality': ['USA', 'USA', 'France', 'UK', 'JK'],
    'code': ['a', 'b','c','d','e']
}

df1 = pd.DataFrame(raw_data, columns = ['name', 'nationality'])
df2 = pd.DataFrame(raw_data_2, columns = ['name_2', 'nationality', 'code'])

What I want to do is matching two dataframes based on some conditions. The condition here is that

  1. if there exists a name from raw_data_2 which is a subset of a value (name) from raw_data_1 when these two names are split by space, and
  2. the nationality should be same.

For easier understanding here's an example: from raw_data_2 , 'Jason You'.split(' ') = ['Jason', 'You'] , so this is a subset of 'Jason Love You'.split(' ') = ['Jason', 'Love', 'You'] . But 'Molly care wist'.split(' ') is NOT a subset of 'Molly care wish'.split(' ') because the latter does not cover the former entirely (perfectly). 'tiger bird'.split(' ') from raw_data_2 is a subset of 'tiger legend bird'.split(' ') , but their nationality is different.

If we meet the above conditions, then finally I want to assign the code value from raw_data_2 . So the desired output(let's just take the code s) would be like:

'a'(matched), Nan(unmatched), Nan(unmatched), 'd', Nan(unmatched)

How can I do this by using pandas? I guess this is not just as simple as 'isin' function or 'map' function.

Using <= operator to test for subset

name = df1.name.str.split().apply(set)
name2 = df2.name_2.str.split().apply(set)
cond1 = name2 <= name
cond2 = df1.nationality == df2.nationality

pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).loc[cond1 & cond2]

              df1                    df2                 
             name nationality     name_2 nationality code
0  Jason love you         USA  Jason you         USA    a
3         dog cat          UK        dog          UK    d

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM