[英]How to check if multiple items from a list appear in a string?
假设我有一个关键字列表:
keywords = ["history terms","history words","history vocab","history words terms","history vocab words","science list","science terms vocab","math terms words vocab"]
以及主要术语列表:
`main_terms = ["terms","words","vocab","list"]`
更新以更清楚地说明问题:
我正在制作的脚本是从一长串关键字中删除几乎重复的内容。 我设法删除了拼写错误和轻微的变体(例如“历史术语”、“历史术语”)。
我的问题是,我在此关键字列表中查找了多个术语,但是在我在关键字(例如“历史术语”)中找到这些术语之一后,所有关键字都相同,但不同的是术语或术语组合(例如“history vocab”、“history words”、“history words terms”等)应被视为重复。
遍历keywords
并根据main_terms
检查每个keywords
:
keywords = ["history terms",
"history words",
"history vocab",
"history words terms",
"history vocab words",
"science list",
"science terms vocab",
"math terms words vocab"]
main_terms = {"terms","words","vocab","list"}
result = {}
for words in keywords:
s = set(words.split())
s_subject = s - main_terms
subject = s_subject and next(iter(s_subject))
if s | main_terms and subject and subject not in result:
result[subject] = words
将结果值转换为列表:
>>> list(result.values())
['math terms words vocab', 'history terms', 'science list']
我确定有一个更优雅的解决方案,但这似乎是您正在寻找的解决方案,至少在第 1 部分):
>>> def remove_main_terms(keyword):
words = keyword.split()
count = 0
to_keep = []
for word in words:
if word in main_terms:
count += 1
if count < 2:
to_keep.append(word)
else:
pass
return " ".join(to_keep)
>>> keywords = ["history terms","history words","history vocab","history words terms","history vocab words","science list","science terms vocab","math terms words vocab"]
>>> main_terms = ["terms","words","vocab","list"]
>>> new_list = []
>>> for w in keywords:
new_list.append(remove_main_terms(w))
>>> new_list
['history terms', 'history words', 'history vocab', 'history words', 'history vocab', 'science list', 'science terms', 'math terms']
编辑:我越来越认为你在问一个XY 问题并且你想要独特的主题。
如果是这种情况,以下方法效果更好:
result = []
found = []
for word in keywords:
for term in main_terms:
if term in word:
word = word.replace(term, "")
result.append(word.strip())
print set(result)
输出set(['science', 'math', 'history'])
这以相同的结果解决了您的原始问题,但通过忽略第一个且仅传递唯一的第一个单词之后的术语来解决此问题。
result = []
found = []
for word in keywords:
found = False
for res in result:
if word.split()[0] in res:
found = True
if not found:
result.append(word)
print result
请参阅repl.it上的演示
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.