簡體   English   中英

如何在文本列表中返回多個匹配項?

[英]How return more than one match on a list of text?

我目前有一個 function 產生一個術語和它出現的句子。此時,function 僅從術語列表中檢索第一個匹配項。 我希望能夠檢索所有匹配項,而不僅僅是第一個。

例如, list_of_matches = ["heart attack", "cardiovascular", "hypoxia"]和一個句子text_list = ["A heart attack is a result of cardiovascular...", "Chronic intermittent hypoxia is the..."]

理想的 output 是:

['heart attack', 'a heart attack is a result of cardiovascular...'],
['cardiovascular', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']

# this is the current function
def find_word(list_of_matches, line):
    for words in list_of_matches:
        if any([words in line]):
            return words, line

# returns list of 'term, matched string'
key_vals = [list(find_word(list_of_matches, line.lower())) for line in text_list if 
find_word(list_of_matches, line.lower()) != None]

# output is currently 
['heart attack', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']

你會想在這里使用正則表達式。

import re

def find_all_matches(words_to_search, text):
    matches = []
    for word in words_to_search:
        matched_text = re.search(word, text).group()
        matches.append(matched_text)
    return [matches, text]

請注意,這將返回所有匹配項的嵌套列表。

解決方案需要 2 個步驟:

  1. 修復 function
  2. 處理 output

鑒於您討厭的 output 遵循模式


    output = [
      [word1, sentence1],
      [word2, sentence1],
      [word3, sentence2],
    ]
  1. 修復 function:您應該更改 de return on 'for' 循環以遍歷 list_of_matches 的每個單詞,以獲取所有匹配的單詞,而不僅僅是第一個

. 它應該保持這樣:


    def find_word(list_of_matches, line):
        answer = []
        for words in list_of_matches:
            if any([words in line]):
                answer.append([words, line])
        return answer

使用上面的 function,output 將是:


    output = [
      [
        ['heart attack', 'a heart attack is a result of cardiovascular...'],
        ['cardiovascular', 'a heart attack is a result of cardiovascular...']
      ],
      [
        ['hypoxia', 'chronic intermittent hypoxia is the...']
      ]
    ]

  1. 處理 output:現在您需要獲取 var "key_vals" 並處理使用以下代碼處理的每個句子的所有列表列表:
    output = []
    for word_sentence_list in key_vals:
        for word_sentence in word_sentence_list:
            output.append(word_sentence)

最后,output 將是:


    output = [
     ['heart attack', 'a heart attack is a result of cardiovascular...'],
     ['cardiovascular', 'a heart attack is a result of cardiovascular...'],
     ['hypoxia', 'chronic intermittent hypoxia is the...']
    ]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM