[英]Extract all matched substring with the order as they appeared in the original sentence
I have a list that stores the defined keywords, For example: 我有一个存储定义的关键字的列表,例如:
keywords = [
"white shark",
"tiger shark",
"funnel web spider",
"inland taipan"]
Now I made a sentence 现在我做了一个句子
str = "A tiger shark spotted here, and a white shark, and a funnel web spider"
From this sentence, I want to produce a result ["tiger shark", "white shark", "funnel web spider"]
. 从这句话,我想产生一个结果["tiger shark", "white shark", "funnel web spider"]
。 The keywords appeared in the sentence with their original order as they are in the sentence. 关键字以其在句子中的原始顺序出现在句子中。 Then, I made a code like this 然后,我做了这样的代码
for i in keywords:
if not str.find(i) == -1:
result.append(i)
This is gonna give me ["white shark", "tiger shark", "funnel web spider"]
, The order is different than my expected result. 这将给我["white shark", "tiger shark", "funnel web spider"]
,顺序与我的预期结果不同。 And my mistake is quite obvious. 我的错误很明显。
So my question is how to achieve the correct order for the result. 所以我的问题是如何获得正确的结果顺序。 I think the only way is to use the string to traverse the keyword list. 我认为唯一的方法是使用字符串遍历关键字列表。 But I think it's quite complicated as it involves many combinations. 但是我认为它很复杂,因为它涉及许多组合。 Any helps? 有帮助吗? Thank you so much. 非常感谢。
It's because you're appending them in order of keywords
, instead we should save their indexes of appearance in my_str
, to later order our words depending on their occurrence in my_str
这是因为您要按keywords
顺序附加它们,而应将它们的出现索引保存在my_str
,以便以后根据单词在my_str
的出现来对它们进行排序
keywords = [
"white shark",
"tiger shark",
"funnel web spider",
"inland taipan"]
my_str = "A tiger shark spotted here, and a white shark, and a funnel web spider"
result = []
for keyword in keywords:
idx = my_str.find(keyword)
if idx != -1:
result.append((idx, keyword))
result = [i[1] for i in sorted(result)] # Sorts by first item in tuple, idx
print(result) # -> ['tiger shark', 'white shark', 'funnel web spider']
You could build a list of (index,keyword) and sort it. 您可以构建一个列表(索引,关键字)并对其进行排序。 Then extract keywords from the matching tuples. 然后从匹配的元组中提取关键字。
keywords = [
"white shark",
"tiger shark",
"funnel web spider",
"inland taipan"]
sentence = "A tiger shark spotted here, and a white shark, and a funnel web spider"
result = [ k for i,k in sorted( (sentence.find(k),k) for k in keywords) if i != -1 ]
print(result)
# ['tiger shark', 'white shark', 'funnel web spider']
You could also use a regular expression (form the re module): 您还可以使用正则表达式(形成re模块):
import re
result = re.findall("|".join(keywords),sentence)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.