使用 str.contains 后合并两个数据框？

Question

I have two data frames I want to match partial strings by using str.contains function then merge them.我有两个数据框，我想通过使用str.contains函数来匹配部分字符串，然后合并它们。

Here is an example:下面是一个例子：

data1

      email     is_mane         name           id
    hi@amal.com     1   there is rain          10
    hi2@amal.com    1   here is the food        9
    hi3@amal.com    1   let's go together       8
    hi4@amal.com    1   today is my birthday    6


data2

    id  name
    1   the rain is beautiful
    1   the food
    2   together
    4   my birthday
    3   your birthday

And here is the code I wrote:这是我写的代码：

data.loc[data.name.str.contains('|'.join(data2.name)),:]

and the output:和输出：

        email   is_mane     name               id
    hi2@amal.com    1   here is the food        9
    hi3@amal.com    1   let's go together       8
    hi4@amal.com    1   today is my birthday    6

As you can see it did not return "there is rain" even that rain word is contained in dara2 : could it be because of space?如您所见，即使dara2包含rain字，它也没有返回“有雨” ：可能是因为空间原因吗？

Also I want to merge data1 with data2 so that will help me to know what email has match.此外，我想将data1与data2合并，以便帮助我了解匹配的电子邮件。

I would like to have the following output:我想要以下输出：


        email   is_mane     name               id      id2       name2
    hi2@amal.com    1   here is the food        9       1       the food
    hi3@amal.com    1   let's go together       8       2       together
    hi4@amal.com    1   today is my birthday    6       4       my birthday
    hi4@amal.com    1   today is my birthday    6       3       your birthday

Is there is any way to do it?有什么办法吗？

Answer 1

If you're good with matching only full words you can do (so eg dog and dogs won't match)如果你擅长只匹配完整的单词，你可以做（所以例如dog和dogs不会匹配）

data1["key"]=data1["name"].str.split(r"[^\w+]")
data2["key"]=data2["name"].str.split(r"[^\w+]")

data3=data1.explode("key").merge(data2.explode("key"), on="key", suffixes=["", "2"]).drop("key", axis=1).drop_duplicates()

Otherwise it's a matter of cross join, and applying str.contains(...) to filter out the ones, which aren't matching.否则，这是交叉连接的问题，并应用str.contains(...)过滤掉不匹配的那些。

使用 str.contains 后合并两个数据框？

问题描述

1 个解决方案

解决方案1
2 2020-01-26 12:46:42

使用 str.contains 后合并两个数据框？

问题描述

1 个解决方案

解决方案1 2 2020-01-26 12:46:42

解决方案1
2 2020-01-26 12:46:42