[英]merge two data frame after use str.contains?
I have two data frames I want to match partial strings by using str.contains
function then merge them.我有两个数据框,我想通过使用
str.contains
函数来匹配部分字符串,然后合并它们。
Here is an example:下面是一个例子:
data1
email is_mane name id
hi@amal.com 1 there is rain 10
hi2@amal.com 1 here is the food 9
hi3@amal.com 1 let's go together 8
hi4@amal.com 1 today is my birthday 6
data2
id name
1 the rain is beautiful
1 the food
2 together
4 my birthday
3 your birthday
And here is the code I wrote:这是我写的代码:
data.loc[data.name.str.contains('|'.join(data2.name)),:]
and the output:和输出:
email is_mane name id
hi2@amal.com 1 here is the food 9
hi3@amal.com 1 let's go together 8
hi4@amal.com 1 today is my birthday 6
As you can see it did not return "there is rain" even that rain
word is contained in dara2
: could it be because of space?如您所见,即使
dara2
包含rain
字,它也没有返回“有雨” :可能是因为空间原因吗?
Also I want to merge data1
with data2
so that will help me to know what email has match.此外,我想将
data1
与data2
合并,以便帮助我了解匹配的电子邮件。
I would like to have the following output:我想要以下输出:
email is_mane name id id2 name2
hi2@amal.com 1 here is the food 9 1 the food
hi3@amal.com 1 let's go together 8 2 together
hi4@amal.com 1 today is my birthday 6 4 my birthday
hi4@amal.com 1 today is my birthday 6 3 your birthday
Is there is any way to do it?有什么办法吗?
If you're good with matching only full words you can do (so eg dog
and dogs
won't match)如果你擅长只匹配完整的单词,你可以做(所以例如
dog
和dogs
不会匹配)
data1["key"]=data1["name"].str.split(r"[^\w+]")
data2["key"]=data2["name"].str.split(r"[^\w+]")
data3=data1.explode("key").merge(data2.explode("key"), on="key", suffixes=["", "2"]).drop("key", axis=1).drop_duplicates()
Otherwise it's a matter of cross join, and applying str.contains(...)
to filter out the ones, which aren't matching.否则,这是交叉连接的问题,并应用
str.contains(...)
过滤掉不匹配的那些。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.