如何使用 Pandas 根据某些条件或函数匹配来自不同数据帧的值？

Question

Suppose I have two dataframes as below.假设我有两个数据框，如下所示。

raw_data = {
    'name': ['Jason love you', 'Molly hope wish care', 'happy birthday', 'dog cat', 'tiger legend bird'],
    'nationality': ['USA', 'USA', 'France', 'UK', 'UK']
}

raw_data_2 = {
    'name_2': ['Jason you', 'Molly care wist', 'hapy birthday', 'dog', 'tiger bird'],
    'nationality': ['USA', 'USA', 'France', 'UK', 'JK'],
    'code': ['a', 'b','c','d','e']
}

df1 = pd.DataFrame(raw_data, columns = ['name', 'nationality'])
df2 = pd.DataFrame(raw_data_2, columns = ['name_2', 'nationality', 'code'])

What I want to do is matching two dataframes based on some conditions.我想要做的是根据某些条件匹配两个数据帧。 The condition here is that这里的条件是

if there exists a name from raw_data_2 which is a subset of a value (name) from raw_data_1 when these two names are split by space, and如果存在从一个名称raw_data_2这是从一个值（名）的一个子集raw_data_1当这两个名字是通过分割的空间，并
the nationality should be same.国籍应该是一样的。

For easier understanding here's an example: from raw_data_2 , 'Jason You'.split(' ') = ['Jason', 'You'] , so this is a subset of 'Jason Love You'.split(' ') = ['Jason', 'Love', 'You'] .为了更容易理解，这里有一个例子：从raw_data_2 ， 'Jason You'.split(' ') = ['Jason', 'You'] ，所以这是'Jason Love You'.split(' ') = ['Jason', 'Love', 'You'] 。 But 'Molly care wist'.split(' ') is NOT a subset of 'Molly care wish'.split(' ') because the latter does not cover the former entirely (perfectly).但'Molly care wist'.split(' ')不是一个子集'Molly care wish'.split(' ')因为后者不包括前完全（完美）。 'tiger bird'.split(' ') from raw_data_2 is a subset of 'tiger legend bird'.split(' ') , but their nationality is different.来自raw_data_2 'tiger bird'.split(' ')是'tiger legend bird'.split(' ')的子集，但它们的国籍不同。

If we meet the above conditions, then finally I want to assign the code value from raw_data_2 .如果我们满足上述条件，那么最后我想从raw_data_2分配code值。 So the desired output(let's just take the code s) would be like:所以所需的输出（让我们只取code s）将是这样的：

'a'(matched), Nan(unmatched), Nan(unmatched), 'd', Nan(unmatched)

How can I do this by using pandas?我怎样才能通过使用熊猫来做到这一点？ I guess this is not just as simple as 'isin' function or 'map' function.我想这不仅仅是“isin”函数或“map”函数那么简单。

Answer 1

Using <= operator to test for subset使用<=运算符测试子集

name = df1.name.str.split().apply(set)
name2 = df2.name_2.str.split().apply(set)
cond1 = name2 <= name
cond2 = df1.nationality == df2.nationality

pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).loc[cond1 & cond2]

              df1                    df2                 
             name nationality     name_2 nationality code
0  Jason love you         USA  Jason you         USA    a
3         dog cat          UK        dog          UK    d

如何使用 Pandas 根据某些条件或函数匹配来自不同数据帧的值？

问题描述

1 个解决方案

解决方案1
1 2017-01-14 14:37:38

如何使用 Pandas 根据某些条件或函数匹配来自不同数据帧的值？

问题描述

1 个解决方案

解决方案1 1 2017-01-14 14:37:38

解决方案1
1 2017-01-14 14:37:38