简体   繁体   English

Python NLTK 提取包含关键字的句子

[英]Python NLTK extract sentence containing a keyword

My objective is to extract sentences from a text file that contain any word that is in my list of keywords.我的目标是从包含我的关键字列表中的任何单词的文本文件中提取句子。 My script cleans up the text file and uses NLTK to tokenize the sentences and remove stopwords.我的脚本清理了文本文件并使用 NLTK 来标记句子并删除停用词。 That part of the script works ok and produces output that looks correct ['affirming updated 2020 range guidance long-term earnings dividend growth outlooks provided earlier month', 'finally look forward increasing engagement existing prospective investors months come', 'turn'] The script that I wrote to extract sentences containing a keyword does not work the way I want.脚本的那部分工作正常,并生成看起来正确的 output ['确认上个月提供的更新的 2020 年范围指导长期收益股息增长前景','终于期待增加现有潜在投资者几个月来的参与度','转向']我为提取包含关键字的句子而编写的脚本无法按我想要的方式工作。 It extracts the keywords but not the sentences in which they occur.它提取关键字,但不提取它们出现的句子。 The output looks like this; output 看起来像这样; [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'impact', 'zone'] ['','','','','','','','','','','影响','区域']

    fileinC=nltk.sent_tokenize(fileinB)
    fileinD=[]
    for sent in fileinC:
        fileinD.append(' '.join(w for w in word_tokenize(sent) if w not in allinstops))
    fileinE=[sent.replace('\n', " ") for sent in fileinD]

    #extract sentences containing keywords
    fileinF=[]
        for sent in fileinE:
    fileinF.append(' '.join(w for w in word_tokenize(sent) if w  in keywords))

It is likely that the conditional append in your last line causes the issue, it is more intuitive to break it down into smaller steps like so:您最后一行中的条件 append 可能会导致问题,将其分解为更小的步骤更直观,如下所示:

fileinF = []
for sent in fileinE:
    # tokenize and lowercase tokens of the sentence
    tokenized_sent = [word.lower() for word in word_tokenize(sent)]
    # if any item in the tokenized sentence is a keyword, append the original sentence
    if any(keyw in tokenized_sent for keyw in keywords):
        fileinF.append(sent)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM