简体   繁体   English

创建一个新列取决于两个不同数据帧中列中的匹配字符串

[英]create a new column depend on matching string in columns in two different dataframes

I have tow data frames A and B, and I want to match between names columns in tow data frames if the name is existing in data set BI need to create a new column in data set A with the Id of data set B if not existing return 0我有两个数据框 A 和 B,如果名称存在于数据集 BI 中,我想在两个数据框中的名称列之间进行匹配 如果不存在,则需要在数据集 A 中创建一个具有数据集 B 的 ID 的新列返回 0

here is the code I wrote这是我写的代码

#data B
    email              name        id
    hi@amal.com       amal call     6
    hi@hotmail.com      amal        6
    hi@gmail.com        AMAL boy    6
    hi@boy.com          boy         7
    hi@hotmail.com      boy         7
    hi@call.com     call AMAL       9
    hi@hotmail.com      boy         7
    hi@dog.com          dog         8
    hi@outlook.com      dog         8
    hi@gmail.com        dog         8



#data A

    id  name
    1   amal
    1   AMAL
    2   call
    4   dog
    3   boy

first I create contains function首先我创建包含功能

A.name.str.contains('|'.join(B.name))

then I tried to create a column然后我尝试创建一个列

A["new"] = np.where(A.name.str.contains('|'.join(B.name))==True, B.id, 0)

but I get this error但我收到这个错误

ValueError: operands could not be broadcast together with shapes (5,) (10,) ()

what I expected is我期望的是

    id  name  new
    1   amal  6
    1   AMAL  0
    2   call  0
    4   dog   7
    3   boy   8

any help?有什么帮助吗?

Use Series.map by Series with removed duplicated rows by DataFrame.drop_duplicates , then replace missing values by Series.fillna and convert to integers:使用Series.map通过系列通过去除重复行DataFrame.drop_duplicates ,然后替换缺失的值Series.fillna并转换为整数:

A["new"] = A.name.map(B.drop_duplicates('name').set_index('name')['id']).fillna(0).astype(int)
print (A)
   id  name  new
0   1  amal    6
1   1  AMAL    0
2   2  call    0
3   4   dog    8
4   3   boy    7

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据两个不同数据框中的匹配值创建新列? - How do I create a new column based on matching values in two different dataframes? 在两个不同的DataFrame中匹配字符串值,并在Pandas中创建一个带有匹配指示符的新列 - Match string values in two different DataFrames and create a new column with match indicator in Pandas 基于两个不同数据帧中的多列创建一个条件列 - create a conditional column based on multiple columns in two different dataframes 基于来自具有不同值的两列的字符串匹配合并来自多个熊猫系列数据帧的两列 - Merge two columns from multiple panda series dataframes based on string matching from two columns with different values 比较两个不同大小的数据帧并在 Pandas 中创建一个新列 - Compare two dataframes with different size and create a new column in Pandas 从不同数据帧(不同长度)中的列中添加新的列来标识日期时间的匹配日期 - append new column identifying matching dates of datetime from columns in different dataframes (of different lengths) 根据两个数据框中多列之间的匹配值定义新列 - Define new column based on matching values between multiple columns in two dataframes 匹配不同数据帧中两个相同列中的值以获得另一列的相应值 - matching values in two same columns at different dataframes to get a corresponding value of another column 比较两个数据框的列并创建一个新的数据框 - Compare columns of two dataframes and create a new dataframe 如何根据两个数据框中两列或三列之间的条件创建新的 boolean 列? - How to create a new boolean column based on conditions between two or three columns from two dataframes?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM