[英]How to match words in 2 list against another string of words without sub-string matching in Python?
我有2个带有关键字的列表:
slangNames = [Vikes, Demmies, D, MS Contin]
riskNames = [enough, pop, final, stress, trade]
我也有一个名为overallDict
的字典,其中包含推文。 键值对是{ID:Tweet文本)例如:
{1:"Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}
我试图仅从slangNames和riskNames中分离出具有至少一个关键字的推文。 因此,该推文必须具有slangNames中的任何关键字和RiskNames中的任何关键字。 因此,从上面的示例中,我的代码应返回键1和3,即
{1:"Vikes is not enough for me", 3:"pop a D"}.
但是我的代码选择的是子字符串而不是完整的单词。 因此,基本上,所有带有字母“ D”的东西都会被拾取。 如何将这些作为整个单词而不是子字符串进行匹配? 请帮忙。 谢谢!
到目前为止,我的代码如下:
for key in overallDict:
if any(x in overallDict[key] for x in strippedRisks) and (any(x in overallDict[key] for x in strippedSlangs)):
output.append(key)
将slangNames和riskNames存储为集合,拆分字符串并检查两个集合中是否都出现了任何单词
slangNames = set(["Vikes", "Demmies", "D", "MS", "Contin"])
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d = {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}
for k,v in d.items():
spl = v.split() # split once
if any(word in slangNames for word in spl) and any(word in riskNames for word in spl):
print(k,v)
输出:
1 Vikes is not enough for me
3 pop a D
或使用not set.isdisjoint
:
slangNames = set(["Vikes", "Demmies", "D", "MS", "Contin"])
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d = {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}
for k,v in d.items():
spl = v.split()
if not slangNames.isdisjoint(spl) and not riskNames.isdisjoint(spl):
print(k, v)
使用any应该是最有效的,因为我们会在第一次比赛时短路。 如果两个集合的交集为空集合,则这两个集合是不相交的,因此,如果两个集合的if not slangNames.isdisjoint(spl)
为True,则至少会出现一个公共单词。
如果MS Contin
实际上是一个单词,则还需要注意:
import re
slangNames = set(["Vikes", "Demmies", "D"])
r = re.compile(r"\bMS Contin\b")
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d = {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}
for k,v in d.items():
spl = v.split()
if (not slangNames.isdisjoint(spl) or r.search(v)) and not riskNames.isdisjoint(spl):
print(k,v)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.