查找一列中的单词与其他列中的句子之间的“匹配”

Question

我有三个我想加入的数据集。 这些数据集有不同的长度（125、200、1000）

One       Two                    Three
man     | man and woman         | there was a cat
nutella | lemon water           | pancakes
bread   | bread and nutella     | look at you
glass   | wine and water        | table

我想将One中的每个单词“连接”到包含该单词的Two和Three中的所有其他行，如下所示：

man : man and woman
nutella : bread and nutella
bread : bread and nutella
glass:

如果没有单词可以连接（例如glass ），我想将它们全部包含在一个名为'Other'的新单词中。

您能否告诉我首先搜索（使用str.contains或re.findall ）其他两列中的单词是否正确，然后使用zip ？

然而，我关心的是如何将One中的每个单词与Two和Three中的其他单词联系起来。 我可能会手动进行（添加，作为搜索词，首先是man ，然后是nutella等等），但我想知道是否可以自动进行（例如将第一列转换为列表）。

Answer 1

让我们尝试findall并用melt explode ，这里的玻璃是水滴，因为它不是基金

pat = '|'.join(r"\b{}\b".format(x) for x in df.One)
s = df.melt('One')
s['New'] = s.value.str.findall(pat)
s = s.explode('New')[['value', 'New']].dropna()
s
Out[42]: 
               value      New
0      man and woman      man
2  bread and nutella    bread
2  bread and nutella  nutella

把它放入字典

d = dict(zip(s.New,s.value))
Out[46]: 
{'man': 'man and woman',
 'bread': 'bread and nutella',
 'nutella': 'bread and nutella'}

查找一列中的单词与其他列中的句子之间的“匹配”

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-15 01:56:20

查找一列中的单词与其他列中的句子之间的“匹配”

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-15 01:56:20

解决方案1
0 已采纳 2020-08-15 01:56:20