繁体   English   中英

如何在Python中没有子字符串匹配的情况下将2列表中的单词与另一个单词字符串匹配?

[英]How to match words in 2 list against another string of words without sub-string matching in Python?

我有2个带有关键字的列表:

slangNames = [Vikes, Demmies, D, MS Contin]
riskNames = [enough, pop, final, stress, trade]

我也有一个名为overallDict的字典,其中包含推文。 键值对是{ID:Tweet文本)例如:

{1:"Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

我试图仅从slangNames和riskNames中分离出具有至少一个关键字的推文。 因此,该推文必须具有slangNames中的任何关键字和RiskNames中的任何关键字。 因此,从上面的示例中,我的代码应返回键1和3,即

{1:"Vikes is not enough for me", 3:"pop a D"}. 

但是我的代码选择的是子字符串而不是完整的单词。 因此,基本上,所有带有字母“ D”的东西都会被拾取。 如何将这些作为整个单词而不是子字符串进行匹配? 请帮忙。 谢谢!

到目前为止,我的代码如下:

for key in overallDict:
    if any(x in overallDict[key] for x in strippedRisks) and (any(x in overallDict[key] for x in strippedSlangs)):
        output.append(key)

将slangNames和riskNames存储为集合,拆分字符串并检查两个集合中是否都出现了任何单词

slangNames = set(["Vikes", "Demmies", "D", "MS", "Contin"])
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d =  {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

for k,v in d.items():
    spl = v.split() # split once
    if any(word in slangNames for word in spl) and any(word  in riskNames for word in spl):
        print(k,v)

输出:

1 Vikes is not enough for me
3 pop a D

或使用not set.isdisjoint

slangNames = set(["Vikes", "Demmies", "D", "MS", "Contin"])
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d =  {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

for k,v in d.items():
    spl = v.split()
    if not slangNames.isdisjoint(spl) and not riskNames.isdisjoint(spl):
        print(k, v)

使用any应该是最有效的,因为我们会在第一次比赛时短路。 如果两个集合的交集为空集合,则这两个集合是不相交的,因此,如果两个集合的if not slangNames.isdisjoint(spl)为True,则至少会出现一个公共单词。

如果MS Contin实际上是一个单词,则还需要注意:

import re
slangNames = set(["Vikes", "Demmies", "D"])
r = re.compile(r"\bMS Contin\b")
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d =  {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

for k,v in d.items():
    spl = v.split()
    if (not slangNames.isdisjoint(spl) or r.search(v)) and not riskNames.isdisjoint(spl):
        print(k,v)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM