如何在文本列表中返回多個匹配項？

Question

我目前有一個 function 產生一個術語和它出現的句子。此時，function 僅從術語列表中檢索第一個匹配項。 我希望能夠檢索所有匹配項，而不僅僅是第一個。

例如， list_of_matches = ["heart attack", "cardiovascular", "hypoxia"]和一個句子text_list = ["A heart attack is a result of cardiovascular...", "Chronic intermittent hypoxia is the..."]

理想的 output 是：

['heart attack', 'a heart attack is a result of cardiovascular...'],
['cardiovascular', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']

# this is the current function
def find_word(list_of_matches, line):
    for words in list_of_matches:
        if any([words in line]):
            return words, line

# returns list of 'term, matched string'
key_vals = [list(find_word(list_of_matches, line.lower())) for line in text_list if 
find_word(list_of_matches, line.lower()) != None]

# output is currently 
['heart attack', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']

Answer 1

你會想在這里使用正則表達式。

import re

def find_all_matches(words_to_search, text):
    matches = []
    for word in words_to_search:
        matched_text = re.search(word, text).group()
        matches.append(matched_text)
    return [matches, text]

請注意，這將返回所有匹配項的嵌套列表。

Answer 2

解決方案需要 2 個步驟：

修復 function
處理 output

鑒於您討厭的 output 遵循模式


    output = [
      [word1, sentence1],
      [word2, sentence1],
      [word3, sentence2],
    ]

修復 function：您應該更改 de return on 'for' 循環以遍歷 list_of_matches 的每個單詞，以獲取所有匹配的單詞，而不僅僅是第一個

. 它應該保持這樣：


    def find_word(list_of_matches, line):
        answer = []
        for words in list_of_matches:
            if any([words in line]):
                answer.append([words, line])
        return answer

使用上面的 function，output 將是：


    output = [
      [
        ['heart attack', 'a heart attack is a result of cardiovascular...'],
        ['cardiovascular', 'a heart attack is a result of cardiovascular...']
      ],
      [
        ['hypoxia', 'chronic intermittent hypoxia is the...']
      ]
    ]

處理 output：現在您需要獲取 var "key_vals" 並處理使用以下代碼處理的每個句子的所有列表列表：

    output = []
    for word_sentence_list in key_vals:
        for word_sentence in word_sentence_list:
            output.append(word_sentence)

最后，output 將是：


    output = [
     ['heart attack', 'a heart attack is a result of cardiovascular...'],
     ['cardiovascular', 'a heart attack is a result of cardiovascular...'],
     ['hypoxia', 'chronic intermittent hypoxia is the...']
    ]

如何在文本列表中返回多個匹配項？

問題描述

2 個解決方案

解決方案1
1 2021-12-13 16:31:32

解決方案2
0 2021-12-13 16:48:28

如何在文本列表中返回多個匹配項？

問題描述

2 個解決方案

解決方案1 1 2021-12-13 16:31:32

解決方案2 0 2021-12-13 16:48:28

解決方案1
1 2021-12-13 16:31:32

解決方案2
0 2021-12-13 16:48:28