简体   繁体   English

如何使用 python 中的正则表达式匹配一句话中的模式?

[英]How to match patterns in one sentence using regex in python?

Here are 2 examples,这里有2个例子,

1. I need to take this apple. I just finished the first one.

2. I need to get some sleep. apple is not working.

I want to match the text with need and apple in the same sentence.我想在同一个句子中将文本与needapple匹配。 By using need.*apple it will match both examples.通过使用need.*apple它将匹配两个示例。 But I want it works only for the first one.但我希望它只适用于第一个。 How do I change the code, or do we have other string methods in Python?如何更改代码,或者我们在 Python 中有其他字符串方法吗?

The comment posted by @ctwheels concerning splitting on . @ctwheels 发表的关于拆分的评论. and then testing to see if if it contains apple and need is a good one not requiring the use of regular expressions.然后测试它是否包含appleneed是一个不需要使用正则表达式的好方法。 I would first, however, split again on white space and then test these words against the resulting list to ensure you do not match against applesauce .但是,我首先会在空白处再次拆分,然后根据结果列表测试这些单词,以确保您与applesauce不匹配。 But here is a regex solution:但这是一个正则表达式解决方案:

import re

text = """I need to take this apple. I just finished the first one.
I need to get some sleep. apple is not working."""

regex = re.compile(r"""
    [^.]*           # match 0 or more non-period characters
    (
        \bneed\b    # match 'need' on a word boundary
        [^.]*       # match 0 or more non-period characters
        \bapple\b   # match 'apple' on a word boundary
      |             # or
        \bapple\b   # match 'apple' on a word boundary
        [^.]*       # match 0 or more non-period characters
        \bneed\b    # match 'need' on a word boundary
    )
    [^.]*           # match 0 or more non-period characters
    \.              # match a period
    """, flags=re.VERBOSE)

for m in regex.finditer(text):
    print(m.group(0))

Prints:印刷:

I need to take this apple.

The problem with both of these solutions is if the sentence contains a period whose usage is for purposes other than ending a sentence, such as I need to take John Q. Public's apple.这两种解决方案的问题是,如果句子中包含一个句号,其使用目的不是结束一个句子,例如I need to take John Q. Public's apple. In this case you need a more powerful mechanism for dividing the text up into sentences.在这种情况下,您需要一种更强大的机制来将文本分成句子。 Then the regex that operates against these sentences, of course, becomes simpler but splitting on white space still seems to make the most sense.然后,对这些句子进行操作的正则表达式当然会变得更简单,但在空白处拆分似乎仍然是最有意义的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM