Here are 2 examples,
1. I need to take this apple. I just finished the first one.
2. I need to get some sleep. apple is not working.
I want to match the text with need
and apple
in the same sentence. By using need.*apple
it will match both examples. But I want it works only for the first one. How do I change the code, or do we have other string methods in Python?
The comment posted by @ctwheels concerning splitting on .
and then testing to see if if it contains apple
and need
is a good one not requiring the use of regular expressions. I would first, however, split again on white space and then test these words against the resulting list to ensure you do not match against applesauce
. But here is a regex solution:
import re
text = """I need to take this apple. I just finished the first one.
I need to get some sleep. apple is not working."""
regex = re.compile(r"""
[^.]* # match 0 or more non-period characters
(
\bneed\b # match 'need' on a word boundary
[^.]* # match 0 or more non-period characters
\bapple\b # match 'apple' on a word boundary
| # or
\bapple\b # match 'apple' on a word boundary
[^.]* # match 0 or more non-period characters
\bneed\b # match 'need' on a word boundary
)
[^.]* # match 0 or more non-period characters
\. # match a period
""", flags=re.VERBOSE)
for m in regex.finditer(text):
print(m.group(0))
Prints:
I need to take this apple.
The problem with both of these solutions is if the sentence contains a period whose usage is for purposes other than ending a sentence, such as I need to take John Q. Public's apple.
In this case you need a more powerful mechanism for dividing the text up into sentences. Then the regex that operates against these sentences, of course, becomes simpler but splitting on white space still seems to make the most sense.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.