[英]How to extract sentences of text that contain keywords in list
I'm trying to return all sentences that contain 'any' words in a list, but the result only returns the sentence for the second word in the list.我正在尝试返回列表中包含“任何”单词的所有句子,但结果仅返回列表中第二个单词的句子。 In the example below, I wanted to extract the sentence that contained inflation and commodity, not just commodity.在下面的示例中,我想提取包含通货膨胀和商品的句子,而不仅仅是商品。 Any help would be appreciated.任何帮助,将不胜感激。
text = 'inflation is very high. commodity prices are rising a lot. this is an extra sentence'
words = ['inflation', 'commodity']
for word in words:
[words.casefold() for words in words] #to ignore cases in text
def extract_word(text):
return [sentence for sentence in text.split('.') if word in sentence]
extract_word(text)
[' commodity prices are rising a lot']
The condition if word in sentence
will check if the iterator word
from the for
loop is in sentence
. if word in sentence
的条件将检查for
循环中的迭代器word
是否在sentence
中。 Since "commodity"
is the last element in the list words
, after the for
loop, word
will contain the string "commodity"
.由于"commodity"
是列表words
中的最后一个元素,因此在for
循环之后, word
将包含字符串"commodity"
。
Instead, in the list comprehension statement, you can check if any of the elements in words
is in sentence
, such as below:相反,在列表理解语句中,您可以检查words
中的任何元素是否在sentence
中,例如:
text = 'inflation is very high. commodity prices are rising a lot. this is an extra sentence'
words = ['inflation', 'commodity']
sentences = [sentence for sentence in text.split(".") if any(
w.lower() in sentence.lower() for w in words
)]
print(sentences)
# >>> ['inflation is very high', ' commodity prices are rising a lot']
I find generators to be pretty handy in these kinds of cases.我发现生成器在这些情况下非常方便。
def extract_word(text):
words = ['inflation', 'commodity']
sentences = text.split('.')
for sentence in sentences:
if any(word in sentence for word in words):
yield sentence
>>> list(extract_word('inflation is very high. commodity prices are rising a lot. this is an extra sentence'))
['inflation is very high', ' commodity prices are rising a lot']
It's readable and easy to understand what the outcome is.它可读且易于理解结果是什么。
You can try this:你可以试试这个:
texts = ' commodity prices are rising a lot. some random text. this text contains the word: inflation'
words = ['inflation','commodity']
lst_words = [words.casefold() for words in words] #to ignore cases in text
def found_word(sentence, lst_words):
return any(word in lst_words for word in sentence.split())
def extract_word(text):
lst_sentences = []
for sentence in text.split('.'):
if found_word(sentence, lst_words):
lst_sentences.append([sentence + '.'])
return lst_sentences
extract_word(texts)
# [[' commodity prices are rising a lot.'],
# [' this text contains the word: inflation.']]
It is a bit longer, but I think much better to read.它有点长,但我认为阅读要好得多。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.