简体   繁体   English

如何使用 Pandas 根据某些条件或函数匹配来自不同数据帧的值?

[英]How can I match values from different dataframes based on some conditions or function using pandas?

Suppose I have two dataframes as below.假设我有两个数据框,如下所示。

raw_data = {
    'name': ['Jason love you', 'Molly hope wish care', 'happy birthday', 'dog cat', 'tiger legend bird'],
    'nationality': ['USA', 'USA', 'France', 'UK', 'UK']
}

raw_data_2 = {
    'name_2': ['Jason you', 'Molly care wist', 'hapy birthday', 'dog', 'tiger bird'],
    'nationality': ['USA', 'USA', 'France', 'UK', 'JK'],
    'code': ['a', 'b','c','d','e']
}

df1 = pd.DataFrame(raw_data, columns = ['name', 'nationality'])
df2 = pd.DataFrame(raw_data_2, columns = ['name_2', 'nationality', 'code'])

What I want to do is matching two dataframes based on some conditions.我想要做的是根据某些条件匹配两个数据帧。 The condition here is that这里的条件是

  1. if there exists a name from raw_data_2 which is a subset of a value (name) from raw_data_1 when these two names are split by space, and如果存在从一个名称raw_data_2这是从一个值(名)的一个子集raw_data_1当这两个名字是通过分割的空间,并
  2. the nationality should be same.国籍应该是一样的。

For easier understanding here's an example: from raw_data_2 , 'Jason You'.split(' ') = ['Jason', 'You'] , so this is a subset of 'Jason Love You'.split(' ') = ['Jason', 'Love', 'You'] .为了更容易理解,这里有一个例子:从raw_data_2'Jason You'.split(' ') = ['Jason', 'You'] ,所以这是'Jason Love You'.split(' ') = ['Jason', 'Love', 'You'] But 'Molly care wist'.split(' ') is NOT a subset of 'Molly care wish'.split(' ') because the latter does not cover the former entirely (perfectly).'Molly care wist'.split(' ')不是一个子集'Molly care wish'.split(' ')因为后者不包括前完全(完美)。 'tiger bird'.split(' ') from raw_data_2 is a subset of 'tiger legend bird'.split(' ') , but their nationality is different.来自raw_data_2 'tiger bird'.split(' ')'tiger legend bird'.split(' ')的子集,但它们的国籍不同。

If we meet the above conditions, then finally I want to assign the code value from raw_data_2 .如果我们满足上述条件,那么最后我想从raw_data_2分配code值。 So the desired output(let's just take the code s) would be like:所以所需的输出(让我们只取code s)将是这样的:

'a'(matched), Nan(unmatched), Nan(unmatched), 'd', Nan(unmatched)

How can I do this by using pandas?我怎样才能通过使用熊猫来做到这一点? I guess this is not just as simple as 'isin' function or 'map' function.我想这不仅仅是“isin”函数或“map”函数那么简单。

Using <= operator to test for subset使用<=运算符测试子集

name = df1.name.str.split().apply(set)
name2 = df2.name_2.str.split().apply(set)
cond1 = name2 <= name
cond2 = df1.nationality == df2.nationality

pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).loc[cond1 & cond2]

              df1                    df2                 
             name nationality     name_2 nationality code
0  Jason love you         USA  Jason you         USA    a
3         dog cat          UK        dog          UK    d

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫-根据某些条件从数据框列表创建数据框 - Pandas - Creating a Dataframe from a list of Dataframes based on some conditions 如何基于 function 合并两个 pandas 数据帧,而不是仅在值相等的地方合并? - How can I merge two pandas DataFrames based on a function instead of just where values are equal? 如何根据熊猫中的不同条件和列将2个数据框分组 - How to group 2 dataframes based on different conditions and columns in pandas 在某些条件下,如何根据不同的数据框列替换一个数据框的列值? - How can we replace the columns values of one dataframe based on different dataframe column using some conditions? 如何根据特定条件用列表值替换 Pandas Dataframes 中的元素? - How to replace elements in Pandas Dataframes with list values based on specific conditions? 如何根据某些条件(包括DateTime)从多个Pandas数据框中映射值? - How to map values from multiple Pandas Dataframes based on certain conditions, including DateTime? 根据不同条件加入pandas数据帧 - Join pandas dataframes based on different conditions Python/Pandas:根据列之间值的匹配组合来自 2 个数据帧的列,但不能使用合并 - Python/Pandas: combine columns from 2 dataframes based on match of values between columns, but can't use merge 使用pandas,如何比较两个数据帧中2列之间的值并将它们推送到新的数据帧? - Using pandas, how can I compare the values between 2 columns from two dataframes and push them to a new dataframe? 在两个不同的DataFrames Pandas中匹配字符串值 - Match string values in two different DataFrames Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM