[英]How return more than one match on a list of text?
我目前有一個 function 產生一個術語和它出現的句子。此時,function 僅從術語列表中檢索第一個匹配項。 我希望能夠檢索所有匹配項,而不僅僅是第一個。
例如, list_of_matches = ["heart attack", "cardiovascular", "hypoxia"]
和一個句子text_list = ["A heart attack is a result of cardiovascular...", "Chronic intermittent hypoxia is the..."]
理想的 output 是:
['heart attack', 'a heart attack is a result of cardiovascular...'],
['cardiovascular', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']
# this is the current function
def find_word(list_of_matches, line):
for words in list_of_matches:
if any([words in line]):
return words, line
# returns list of 'term, matched string'
key_vals = [list(find_word(list_of_matches, line.lower())) for line in text_list if
find_word(list_of_matches, line.lower()) != None]
# output is currently
['heart attack', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']
你會想在這里使用正則表達式。
import re
def find_all_matches(words_to_search, text):
matches = []
for word in words_to_search:
matched_text = re.search(word, text).group()
matches.append(matched_text)
return [matches, text]
請注意,這將返回所有匹配項的嵌套列表。
解決方案需要 2 個步驟:
鑒於您討厭的 output 遵循模式
output = [ [word1, sentence1], [word2, sentence1], [word3, sentence2], ]
. 它應該保持這樣:
def find_word(list_of_matches, line): answer = [] for words in list_of_matches: if any([words in line]): answer.append([words, line]) return answer
使用上面的 function,output 將是:
output = [ [ ['heart attack', 'a heart attack is a result of cardiovascular...'], ['cardiovascular', 'a heart attack is a result of cardiovascular...'] ], [ ['hypoxia', 'chronic intermittent hypoxia is the...'] ] ]
output = [] for word_sentence_list in key_vals: for word_sentence in word_sentence_list: output.append(word_sentence)
最后,output 將是:
output = [ ['heart attack', 'a heart attack is a result of cardiovascular...'], ['cardiovascular', 'a heart attack is a result of cardiovascular...'], ['hypoxia', 'chronic intermittent hypoxia is the...'] ]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.