按照在原始句子中出现的顺序提取所有匹配的子字符串

Question

I have a list that stores the defined keywords, For example: 我有一个存储定义的关键字的列表，例如：

keywords = [
    "white shark",
    "tiger shark",
    "funnel web spider",
    "inland taipan"]

Now I made a sentence 现在我做了一个句子

str = "A tiger shark spotted here, and a white shark, and a funnel web spider"

From this sentence, I want to produce a result ["tiger shark", "white shark", "funnel web spider"] . 从这句话，我想产生一个结果["tiger shark", "white shark", "funnel web spider"] 。 The keywords appeared in the sentence with their original order as they are in the sentence. 关键字以其在句子中的原始顺序出现在句子中。 Then, I made a code like this 然后，我做了这样的代码

for i in keywords:
    if not str.find(i) == -1:
          result.append(i)

This is gonna give me ["white shark", "tiger shark", "funnel web spider"] , The order is different than my expected result. 这将给我["white shark", "tiger shark", "funnel web spider"] ，顺序与我的预期结果不同。 And my mistake is quite obvious. 我的错误很明显。

So my question is how to achieve the correct order for the result. 所以我的问题是如何获得正确的结果顺序。 I think the only way is to use the string to traverse the keyword list. 我认为唯一的方法是使用字符串遍历关键字列表。 But I think it's quite complicated as it involves many combinations. 但是我认为它很复杂，因为它涉及许多组合。 Any helps? 有帮助吗？ Thank you so much. 非常感谢。

Answer 1

It's because you're appending them in order of keywords , instead we should save their indexes of appearance in my_str , to later order our words depending on their occurrence in my_str 这是因为您要按keywords顺序附加它们，而应将它们的出现索引保存在my_str ，以便以后根据单词在my_str的出现来对它们进行排序

keywords = [
    "white shark",
    "tiger shark",
    "funnel web spider",
    "inland taipan"]
my_str = "A tiger shark spotted here, and a white shark, and a funnel web spider"

result = []
for keyword in keywords:
    idx = my_str.find(keyword)
    if idx != -1:
        result.append((idx, keyword))

result = [i[1] for i in sorted(result)]  # Sorts by first item in tuple, idx

print(result) # -> ['tiger shark', 'white shark', 'funnel web spider']

Answer 2

You could build a list of (index,keyword) and sort it. 您可以构建一个列表（索引，关键字）并对其进行排序。 Then extract keywords from the matching tuples. 然后从匹配的元组中提取关键字。

keywords = [
"white shark",
"tiger shark",
"funnel web spider",
"inland taipan"]
sentence = "A tiger shark spotted here, and a white shark, and a funnel web spider"

result = [ k for i,k in sorted( (sentence.find(k),k) for k in keywords) if i != -1 ]

print(result)
# ['tiger shark', 'white shark', 'funnel web spider']

You could also use a regular expression (form the re module): 您还可以使用正则表达式（形成re模块）：

import re

result  = re.findall("|".join(keywords),sentence)

按照在原始句子中出现的顺序提取所有匹配的子字符串

问题描述

2 个解决方案

解决方案1
4 已采纳 2019-04-15 16:06:43

解决方案2
1 2019-04-15 16:53:14

按照在原始句子中出现的顺序提取所有匹配的子字符串

问题描述

2 个解决方案

解决方案1 4 已采纳 2019-04-15 16:06:43

解决方案2 1 2019-04-15 16:53:14

解决方案1
4 已采纳 2019-04-15 16:06:43

解决方案2
1 2019-04-15 16:53:14