[英]create a new column depend on matching string in columns in two different dataframes
I have tow data frames A and B, and I want to match between names columns in tow data frames if the name is existing in data set BI need to create a new column in data set A with the Id of data set B if not existing return 0我有两个数据框 A 和 B,如果名称存在于数据集 BI 中,我想在两个数据框中的名称列之间进行匹配 如果不存在,则需要在数据集 A 中创建一个具有数据集 B 的 ID 的新列返回 0
here is the code I wrote这是我写的代码
#data B
email name id
hi@amal.com amal call 6
hi@hotmail.com amal 6
hi@gmail.com AMAL boy 6
hi@boy.com boy 7
hi@hotmail.com boy 7
hi@call.com call AMAL 9
hi@hotmail.com boy 7
hi@dog.com dog 8
hi@outlook.com dog 8
hi@gmail.com dog 8
#data A
id name
1 amal
1 AMAL
2 call
4 dog
3 boy
first I create contains function首先我创建包含功能
A.name.str.contains('|'.join(B.name))
then I tried to create a column然后我尝试创建一个列
A["new"] = np.where(A.name.str.contains('|'.join(B.name))==True, B.id, 0)
but I get this error但我收到这个错误
ValueError: operands could not be broadcast together with shapes (5,) (10,) ()
what I expected is我期望的是
id name new
1 amal 6
1 AMAL 0
2 call 0
4 dog 7
3 boy 8
any help?有什么帮助吗?
Use Series.map
by Series with removed duplicated rows by DataFrame.drop_duplicates
, then replace missing values by Series.fillna
and convert to integers:使用
Series.map
通过系列通过去除重复行DataFrame.drop_duplicates
,然后替换缺失的值Series.fillna
并转换为整数:
A["new"] = A.name.map(B.drop_duplicates('name').set_index('name')['id']).fillna(0).astype(int)
print (A)
id name new
0 1 amal 6
1 1 AMAL 0
2 2 call 0
3 4 dog 8
4 3 boy 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.