繁体   English   中英

如何从列表中提取一系列连续的预先指定的单词?

[英]How do I extract a sequence of contiguous prespecified words from a list?

我有两个列表:

list_1是我感兴趣的单词列表
list_2是从文本中提取的词的标记化序列

我想要做的是从list_2中提取单词序列(如果它们包含在list_1中),并将它们连接起来,只要list_2中的以下单词也包含在list_1中。

不幸的是,我不知道如何开始。 任何小费将不胜感激。

此致!

你不妨试试:

text = ("What I want to do is to extract sequences of words out of list_2 " 
        "if they are contained in list_1 and concatenate them as long as the "
        "following word in list_2 is also contained in list_1. Is to")

list1 = ["is", "to", "do"]
list2 = text.lower().split(" ")

def extract(list2, list1):
    res = []
    string = ""
    for word in list2:
        if word in list1:
            string += " " + word
        elif string:
            res.append(string.strip())
            string = ""
    res.append(string.strip())
    return res

extract(list2, list1)

['to do is to', 'is', 'is to']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM