创建一个新列取决于两个不同数据帧中列中的匹配字符串

Question

I have tow data frames A and B, and I want to match between names columns in tow data frames if the name is existing in data set BI need to create a new column in data set A with the Id of data set B if not existing return 0我有两个数据框 A 和 B，如果名称存在于数据集 BI 中，我想在两个数据框中的名称列之间进行匹配如果不存在，则需要在数据集 A 中创建一个具有数据集 B 的 ID 的新列返回 0

here is the code I wrote这是我写的代码

#data B
    email              name        id
    hi@amal.com       amal call     6
    hi@hotmail.com      amal        6
    hi@gmail.com        AMAL boy    6
    hi@boy.com          boy         7
    hi@hotmail.com      boy         7
    hi@call.com     call AMAL       9
    hi@hotmail.com      boy         7
    hi@dog.com          dog         8
    hi@outlook.com      dog         8
    hi@gmail.com        dog         8



#data A

    id  name
    1   amal
    1   AMAL
    2   call
    4   dog
    3   boy

first I create contains function首先我创建包含功能

A.name.str.contains('|'.join(B.name))

then I tried to create a column然后我尝试创建一个列

A["new"] = np.where(A.name.str.contains('|'.join(B.name))==True, B.id, 0)

but I get this error但我收到这个错误

ValueError: operands could not be broadcast together with shapes (5,) (10,) ()

what I expected is我期望的是

    id  name  new
    1   amal  6
    1   AMAL  0
    2   call  0
    4   dog   7
    3   boy   8

any help?有什么帮助吗？

Answer 1

Use Series.map by Series with removed duplicated rows by DataFrame.drop_duplicates , then replace missing values by Series.fillna and convert to integers:使用Series.map通过系列通过去除重复行DataFrame.drop_duplicates ，然后替换缺失的值Series.fillna并转换为整数：

A["new"] = A.name.map(B.drop_duplicates('name').set_index('name')['id']).fillna(0).astype(int)
print (A)
   id  name  new
0   1  amal    6
1   1  AMAL    0
2   2  call    0
3   4   dog    8
4   3   boy    7

创建一个新列取决于两个不同数据帧中列中的匹配字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-01-21 07:43:44

创建一个新列取决于两个不同数据帧中列中的匹配字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-01-21 07:43:44

解决方案1
1 已采纳 2020-01-21 07:43:44