简体   繁体   English

按照在原始句子中出现的顺序提取所有匹配的子字符串

[英]Extract all matched substring with the order as they appeared in the original sentence

I have a list that stores the defined keywords, For example: 我有一个存储定义的关键字的列表,例如:

keywords = [
    "white shark",
    "tiger shark",
    "funnel web spider",
    "inland taipan"]

Now I made a sentence 现在我做了一个句子

str = "A tiger shark spotted here, and a white shark, and a funnel web spider"

From this sentence, I want to produce a result ["tiger shark", "white shark", "funnel web spider"] . 从这句话,我想产生一个结果["tiger shark", "white shark", "funnel web spider"] The keywords appeared in the sentence with their original order as they are in the sentence. 关键字以其在句子中的原始顺序出现在句子中。 Then, I made a code like this 然后,我做了这样的代码

for i in keywords:
    if not str.find(i) == -1:
          result.append(i)

This is gonna give me ["white shark", "tiger shark", "funnel web spider"] , The order is different than my expected result. 这将给我["white shark", "tiger shark", "funnel web spider"] ,顺序与我的预期结果不同。 And my mistake is quite obvious. 我的错误很明显。

So my question is how to achieve the correct order for the result. 所以我的问题是如何获得正确的结果顺序。 I think the only way is to use the string to traverse the keyword list. 我认为唯一的方法是使用字符串遍历关键字列表。 But I think it's quite complicated as it involves many combinations. 但是我认为它很复杂,因为它涉及许多组合。 Any helps? 有帮助吗? Thank you so much. 非常感谢。

It's because you're appending them in order of keywords , instead we should save their indexes of appearance in my_str , to later order our words depending on their occurrence in my_str 这是因为您要按keywords顺序附加它们,而应将它们的出现索引保存在my_str ,以便以后根据单词在my_str的出现来对它们进行排序

keywords = [
    "white shark",
    "tiger shark",
    "funnel web spider",
    "inland taipan"]
my_str = "A tiger shark spotted here, and a white shark, and a funnel web spider"

result = []
for keyword in keywords:
    idx = my_str.find(keyword)
    if idx != -1:
        result.append((idx, keyword))

result = [i[1] for i in sorted(result)]  # Sorts by first item in tuple, idx

print(result) # -> ['tiger shark', 'white shark', 'funnel web spider']

You could build a list of (index,keyword) and sort it. 您可以构建一个列表(索引,关键字)并对其进行排序。 Then extract keywords from the matching tuples. 然后从匹配的元组中提取关键字。

keywords = [
"white shark",
"tiger shark",
"funnel web spider",
"inland taipan"]
sentence = "A tiger shark spotted here, and a white shark, and a funnel web spider"

result = [ k for i,k in sorted( (sentence.find(k),k) for k in keywords) if i != -1 ]

print(result)
# ['tiger shark', 'white shark', 'funnel web spider']

You could also use a regular expression (form the re module): 您还可以使用正则表达式(形成re模块):

import re

result  = re.findall("|".join(keywords),sentence)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何获取匹配substring的句子/字符串 - How to get the sentence / string with the matched substring 如果 substring 在其中并带有 re 模块,如何提取所有原始化合物? - How to extract all of original compounds if a substring is in them with re module? 以准确的顺序提取子字符串 - Extract substring in exact order 从包含 substring 的句子中提取单词 - Extract words from sentence that are containing substring 如何使用关键字或 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ 提取句子前后的句子? - How to also extract sentence before and after sentence with keyword or substring? spacy 规则匹配器从匹配的句子中提取值 - spacy rule-matcher extract value from matched sentence 提取使用python匹配关键字的句子的索引 - Extract Index of a sentence where the keyword is matched using python 提取列表中与字符串匹配的所有元素 - Extract all element of a list that matched a string 如何修复此 RegEx 模式,以便提取与此 regex 模式匹配的字符串中所有可能出现的 substring? - How do I fix this RegEx pattern, in order to extract all possible occurrences of a substring within a string that match this regex pattern? 给定一个单词列表和一个句子,找到整个句子中出现的所有单词或作为子串 - Given a list of words and a sentence find all words that appear in the sentence either in whole or as a substring
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM