简体   繁体   English

正则表达式可以找到包含单词的句子中的所有内容

[英]Regex to find everything in sentence containing word

I am trying to figure out how to find the sentence containing a certain word so lets say the word is 'wow' then in the three following strings 我试图弄清楚如何找到包含某个单词的句子,所以可以说这个单词是“哇”,然后在接下来的三个字符串中

\\nOkay hold on. This is pretty wow in here. Okay.\\n

\\nThis is super wow. Doesn't get much more wow than that.\\n

\\nHold up. wow.\\n

\\nOkay wow. Just wow!\\n

would yield the following respectively: 将分别产生以下内容:

This is pretty wow in here

This is super wow.

wow.

Okay wow.

I am doing this in Python3 so I have the luxury of writing if statements but it is messy and I am hoping to avoid doing so. 我正在Python3中执行此操作,因此我可以编写if语句,但很麻烦,我希望避免这样做。 Here is my code for what was working but started failing. 这里是我的什么工作,但未能启动代码。 Maybe I am just too bad at Regex and am over complicating this. 也许我在Regex上太糟糕了,并且使这一点变得复杂了。

    m = re.search('(?:(\.\s[A-Z]))(?=(.*)' + name+ '([^a-z^A-Z]))([^.]*)(\.\s[A-Z])', node.getIntroText())
    if m == None:
        m = re.search('(?:(\.\s[A-Z]))(?=(.*)' + name+ '([^a-z^A-Z]))(.*)(\.\s[A-Z])', node.getIntroText())
    if m == None:
        m = re.search('(?:([\r\n]))(?=(.*)' + name+ '([^a-z^A-Z]))([^.]*)(\.\s[A-Z])', node.getIntroText())

Essentially I want to capture the (first period or newline) instance before 'name' all the way to the next instance of a period followed by a (space and anything but a letter) or a new line. 从本质上讲,我想捕获“名称”之前的(第一个句点或换行符)实例,再到一个句点的下一个实例,然后是(空格和字母以外的任何东西)或换行符。

Converting my comment to answer. 将我的评论转换为答案。 You may use this regex 您可以使用此正则表达式

>>> reg = re.compile(r"^(?:(?:(?!\bwow\b)[^.\n])*\. +)*((?:[a-z][^.\n]*?)?\bwow\b[^.\n]*)(?=\.)", re.MULTILINE | re.IGNORECASE)
>>> test_str = ("\n"
...     "Okay hold on. This is pretty wow in here. Okay.\n\n"
...     "This is super wow. Doesn't get much more wow than that.\n\n"
...     "Hold up. wow.\n\n"
...     "Okay wow. Just Wow!\n")
>>> print ( reg.findall(test_str) )

['This is pretty wow in here', 'This is super wow', 'wow', 'Okay wow']

RegEx Demo 正则演示

RegEx Explanation: RegEx说明:

  • ^ : Start ^ :开始
  • (?:(?:(?!\\bwow\\b)[^.\\n])*\\. +)* : Match 0 or more sentences that don't contain wow . (?:(?:(?!\\bwow\\b)[^.\\n])*\\. +)* :匹配0个或多个不包含wow句子。
  • ((?:[az][^.\\n]*?)?\\bwow\\b[^.\\n]*) : Match a sentence containing word wow ((?:[az][^.\\n]*?)?\\bwow\\b[^.\\n]*) :匹配包含单词wow的句子
  • (?=\\.) : Assert that we have dot at next position (?=\\.) :断言我们在下一个位置上有点
  • Modes re.MULTILINE | re.IGNORECASE 模式re.MULTILINE | re.IGNORECASE re.MULTILINE | re.IGNORECASE are for multiline and ignore-case re.MULTILINE | re.IGNORECASE适用于多行和忽略大小写

Calling re.replace() makes life simple: 调用re.replace()使生活变得简单:

wowSentence = re.sub('.*?(?:^|\. *)([^.]*\bwow\b[^.]*).*', '$1', paragraph)

See live demo . 观看现场演示

Add (?i) to the front of the regex to match wow case insensitively. 在正则表达式的前面添加(?i)以不区分大小写地匹配wow

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM