简体   繁体   English

使用python提取一个句子

[英]extract a sentence using python

I would like to extract the exact sentence if a particular word is present in that sentence. 如果该句中存在特定单词,我想提取确切的句子。 Could anyone let me know how to do it with python. 谁能让我知道如何用python做到这一点。 I used concordance() but it only prints lines where the word matches. 我使用了concordance()但它只打印了单词匹配的行。

Just a quick reminder: Sentence breaking is actually a pretty complex thing, there's exceptions to the period rule, such as "Mr." 只是一个快速提醒:句子破坏实际上是一个非常复杂的事情,期间规则有例外,例如“先生” or "Dr." 或“博士” There's also a variety of sentence ending punctuation marks. 还有各种句子结尾的标点符号。 But there's also exceptions to the exception (if the next word is Capitalized and is not a proper noun, then Dr. can end a sentence, for example). 但是例外也有例外(如果下一个单词是大写且不是专有名词,那么博士可以结束一个句子,例如)。

If you're interested in this (it's a natural language processing topic) you could check out: 如果您对此感兴趣(这是一个自然语言处理主题),您可以查看:
the natural language tool kit's (nltk) punkt module . 自然语言工具包(nltk) punkt模块

If you have each sentence in a string you can use find() on your word and if found return the sentence. 如果你在一个字符串中有每个句子,你可以在你的单词上使用find(),如果找到则返回句子。 Otherwise you could use a regex, something like this 否则你可以使用正则表达式,像这样

pattern = "\.?(?P<sentence>.*?good.*?)\."
match = re.search(pattern, yourwholetext)
if match != None:
    sentence = match.group("sentence")

I havent tested this but something along those lines. 我没有测试过这个但是沿着这些方向的东西。

My test: 我的测试:

import re
text = "muffins are good, cookies are bad. sauce is awesome, veggies too. fmooo mfasss, fdssaaaa."
pattern = "\.?(?P<sentence>.*?good.*?)\."
match = re.search(pattern, text)
if match != None:
    print match.group("sentence")

dutt did a good job answering this. 杜特很好地回答了这个问题。 just wanted to add a couple things 只是想添加一些东西

import re

text = "go directly to jail. do not cross go. do not collect $200."
pattern = "\.(?P<sentence>.*?(go).*?)\."
match = re.search(pattern, text)
if match != None:
    sentence = match.group("sentence")

obviously, you'll need to import the regex library (import re) before you begin. 显然,在开始之前,您需要导入正则表达式库(import re)。 here is a teardown of what the regular expression actually does (more info can be found at the Python re library page ) 这是对正则表达式实际执行操作的拆解(可以在Python re库页面找到更多信息)

\. # looks for a period preceding sentence.
(?P<sentence>...) # sets the regex captured to variable "sentence".
.*? # selects all text (non-greedy) until the word "go".

again, the link to the library ref page is key. 再次,库ref页的链接是关键。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM