简体   繁体   English

如何提取列表中包含关键字的文本句子

[英]How to extract sentences of text that contain keywords in list

I'm trying to return all sentences that contain 'any' words in a list, but the result only returns the sentence for the second word in the list.我正在尝试返回列表中包含“任何”单词的所有句子,但结果仅返回列表中第二个单词的句子。 In the example below, I wanted to extract the sentence that contained inflation and commodity, not just commodity.在下面的示例中,我想提取包含通货膨胀和商品的句子,而不仅仅是商品。 Any help would be appreciated.任何帮助,将不胜感激。

text = 'inflation is very high. commodity prices are rising a lot. this is an extra sentence'
words = ['inflation', 'commodity']
for word in words:
    [words.casefold() for words in words] #to ignore cases in text

def extract_word(text):

    return [sentence for sentence in text.split('.') if word in sentence]

extract_word(text)

[' commodity prices are rising a lot']

The condition if word in sentence will check if the iterator word from the for loop is in sentence . if word in sentence的条件将检查for循环中的迭代器word是否在sentence中。 Since "commodity" is the last element in the list words , after the for loop, word will contain the string "commodity" .由于"commodity"是列表words中的最后一个元素,因此在for循环之后, word将包含字符串"commodity"

Instead, in the list comprehension statement, you can check if any of the elements in words is in sentence , such as below:相反,在列表理解语句中,您可以检查words中的任何元素是否在sentence中,例如:

text = 'inflation is very high. commodity prices are rising a lot. this is an extra sentence'
words = ['inflation', 'commodity']
sentences = [sentence for sentence in text.split(".") if any(
    w.lower() in sentence.lower() for w in words
)]

print(sentences)
# >>> ['inflation is very high', ' commodity prices are rising a lot']

I find generators to be pretty handy in these kinds of cases.我发现生成器在这些情况下非常方便。

def extract_word(text):
    words = ['inflation', 'commodity']
    sentences = text.split('.')
    for sentence in sentences:
        if any(word in sentence for word in words):
            yield sentence

>>> list(extract_word('inflation is very high. commodity prices are rising a lot. this is an extra sentence'))
['inflation is very high', ' commodity prices are rising a lot']

It's readable and easy to understand what the outcome is.它可读且易于理解结果是什么。

You can try this:你可以试试这个:

texts = ' commodity prices are rising a lot. some random text. this text contains the word: inflation'
words = ['inflation','commodity']
lst_words = [words.casefold() for words in words] #to ignore cases in text

def found_word(sentence, lst_words):
    return any(word in lst_words for word in sentence.split())

def extract_word(text):
    lst_sentences = []
    for sentence in text.split('.'):
        if found_word(sentence, lst_words):
            lst_sentences.append([sentence + '.'])
    return lst_sentences

extract_word(texts)
# [[' commodity prices are rising a lot.'],
#  [' this text contains the word: inflation.']]

It is a bit longer, but I think much better to read.它有点长,但我认为阅读要好得多。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果只有句子包含搜索列表中的任何关键字,则从数据框文本列中选择句子 - Selecting sentences from a data frame text column if only the sentences contain any of the keywords from a search list 如果前三个句子包含关键字,如何过滤字符串 - How to filter strings if the first three sentences contain keywords 美丽的汤-提取包含随机标记标签的完整文本句子 - Beautiful Soup - extract intact sentences of text that contain random markup tags 如何将句子列表与关键字列表匹配 - How do I match list of sentences with a list of keywords 如何从键值列表中的句子中搜索关键字,并得到带有相对引用的句子的匹配结果? - How to search keywords from sentences in a key-value list and get the matched result of the sentences with relative references? 如何根据关键字从另一个 csv 文件中提取 csv 文件中的句子并将其从主要文件中删除 - how to extract sentences in a csv file based on keywords from another csv file and delete it from the main one 如何提取以下文本中带有评论/文本的所有句子? - How to extract all the sentences with review/text in the below text? 从关键字列表到字典查找所有句子 - Finding all sentences from list of keywords to dict 如何在单个文本中转换句子列表? - How to convert a list of sentences in a single text? 使用正则表达式提取包含某些单词的句子 - Extract sentences that contain certain words using Regex
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM