繁体   English   中英

如何检查列表中的多个项目是否出现在字符串中?

[英]How to check if multiple items from a list appear in a string?

假设我有一个关键字列表:

keywords = ["history terms","history words","history vocab","history words terms","history vocab words","science list","science terms vocab","math terms words vocab"]

以及主要术语列表:

`main_terms = ["terms","words","vocab","list"]`

更新以更清楚地说明问题:

我正在制作的脚本是从一长串关键字中删除几乎重复的内容。 我设法删除了拼写错误和轻微的变体(例如“历史术语”、“历史术语”)。

我的问题是,我在此关键字列表中查找了多个术语,但是在我在关键字(例如“历史术语”)中找到这些术语之一后,所有关键字都相同,但不同的是术语或术语组合(例如“history vocab”、“history words”、“history words terms”等)应被视为重复。

  • 可以在关键字中包含多个术语(例如“数学术语词汇”) ,只要没有相同的关键字,除了术语数量较少(例如“数学术语单词”或理想情况下)一个单一的术语,如“数学词汇”)。

遍历keywords并根据main_terms检查每个keywords

keywords = ["history terms",
            "history words",
            "history vocab",
            "history words terms",
            "history vocab words",
            "science list",
            "science terms vocab",
            "math terms words vocab"]
main_terms = {"terms","words","vocab","list"}
result = {}
for words in keywords:
    s = set(words.split())
    s_subject = s - main_terms
    subject = s_subject and next(iter(s_subject))
    if s | main_terms and subject and subject not in result:
        result[subject] = words

将结果值转换为列表:

>>> list(result.values())
['math terms words vocab', 'history terms', 'science list']

我确定有一个更优雅的解决方案,但这似乎是您正在寻找的解决方案,至少在第 1 部分):

>>> def remove_main_terms(keyword):
        words = keyword.split()
        count = 0
        to_keep = []
        for word in words:
            if word in main_terms:
                count += 1
            if count < 2:
                to_keep.append(word)
            else:
                pass
        return " ".join(to_keep)

>>> keywords = ["history terms","history words","history vocab","history words terms","history vocab words","science list","science terms vocab","math terms words vocab"]

>>> main_terms = ["terms","words","vocab","list"]

>>> new_list = []
>>> for w in keywords:
        new_list.append(remove_main_terms(w))

>>> new_list
['history terms', 'history words', 'history vocab', 'history words', 'history vocab', 'science list', 'science terms', 'math terms']

编辑:我越来越认为你在问一个XY 问题并且你想要独特的主题。

如果是这种情况,以下方法效果更好:

result = []
found = []
for word in keywords:
    for term in main_terms:
        if term in word:
            word = word.replace(term, "")
    result.append(word.strip())

print set(result)

输出set(['science', 'math', 'history'])


这以相同的结果解决了您的原始问题,但通过忽略第一个且仅传递唯一的第一个单词之后的术语来解决此问题。

result = []
found = []
for word in keywords:
    found = False
    for res in result:
        if word.split()[0] in res:
            found = True
    if not found:
        result.append(word)
print result

请参阅repl.it上的演示

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM