Suppose I have two dataframes as below.
raw_data = {
'name': ['Jason love you', 'Molly hope wish care', 'happy birthday', 'dog cat', 'tiger legend bird'],
'nationality': ['USA', 'USA', 'France', 'UK', 'UK']
}
raw_data_2 = {
'name_2': ['Jason you', 'Molly care wist', 'hapy birthday', 'dog', 'tiger bird'],
'nationality': ['USA', 'USA', 'France', 'UK', 'JK'],
'code': ['a', 'b','c','d','e']
}
df1 = pd.DataFrame(raw_data, columns = ['name', 'nationality'])
df2 = pd.DataFrame(raw_data_2, columns = ['name_2', 'nationality', 'code'])
What I want to do is matching two dataframes based on some conditions. The condition here is that
raw_data_2
which is a subset of a value (name) from raw_data_1
when these two names are split by space, and For easier understanding here's an example: from raw_data_2
, 'Jason You'.split(' ') = ['Jason', 'You']
, so this is a subset of 'Jason Love You'.split(' ') = ['Jason', 'Love', 'You']
. But 'Molly care wist'.split(' ')
is NOT a subset of 'Molly care wish'.split(' ')
because the latter does not cover the former entirely (perfectly). 'tiger bird'.split(' ')
from raw_data_2
is a subset of 'tiger legend bird'.split(' ')
, but their nationality is different.
If we meet the above conditions, then finally I want to assign the code
value from raw_data_2
. So the desired output(let's just take the code
s) would be like:
'a'(matched), Nan(unmatched), Nan(unmatched), 'd', Nan(unmatched)
How can I do this by using pandas? I guess this is not just as simple as 'isin' function or 'map' function.
Using <=
operator to test for subset
name = df1.name.str.split().apply(set)
name2 = df2.name_2.str.split().apply(set)
cond1 = name2 <= name
cond2 = df1.nationality == df2.nationality
pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).loc[cond1 & cond2]
df1 df2
name nationality name_2 nationality code
0 Jason love you USA Jason you USA a
3 dog cat UK dog UK d
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.