[英]Match smallest possible sentence
Text:文本:
One sentence here, much wow. Another one here. This is O.N.E. example n. 1, a nice one to understand. Hope it's clear now!
Regex: (?<=\.\s)[AZ].+?nice one.+?\.(?=\s[AZ])
正则表达式:
(?<=\.\s)[AZ].+?nice one.+?\.(?=\s[AZ])
Result: Another one here. This is ONE example n. 1, a nice one to understand.
结果:
Another one here. This is ONE example n. 1, a nice one to understand.
Another one here. This is ONE example n. 1, a nice one to understand.
How can I do to obtain This is ONE example among n. 1, a nice one to understand.
我该怎么做才能获得
This is ONE example among n. 1, a nice one to understand.
This is ONE example among n. 1, a nice one to understand.
? ? (ie the smallest possible sentence that matches the regex)
(即与正则表达式匹配的最小可能句子)
Just insert a greedy .*
in front of the expression只需在表达式前面插入一个贪婪的
.*
.*\.\s([A-Z].+?nice one.+?\.(?=\s[A-Z]))
Here is a little bit of a different approach just splitting the entire text and then filtering out what you are after:这是一种不同的方法,只是拆分整个文本,然后过滤掉您所追求的内容:
import re
s = "One sentence here, much wow. Another one here. This is O.N.E. example n. 1, a nice one to understand. Hope it's clear now!"
result = [x for x in re.split(r'(?<=\B.\.)\s*',s) if 'nice one' in x][0]
print(result) # This is O.N.E. example n. 1, a nice one to understand.
Not sure how many edge-cases you have got but here I used re.split()
with the following pattern: (?<=\B.\.)\s*
.不确定你有多少边缘情况,但在这里我使用了
re.split()
和以下模式: (?<=\B.\.)\s*
。 This would mean:这意味着:
(?<=\B.\.)
- A positive lookbehind to assert position is after a position where \b
(a word-boundary) does not apply, followed by a literal dot. (?<=\B.\.)
- 断言 position 在 position 之后的肯定回溯,其中\b
(字边界)不适用,后跟文字点。\s*
- 0+ Whitespace characters. \s*
- 0+ 个空白字符。 With the resulting array it won't be too much problem to check which element is holding your desired words "nice one".使用生成的数组,检查哪个元素包含您想要的单词“nice one”不会有太大问题。
You could exclude matching a dot, and only match the dot incase of an uppercase char followed by a dot, or a dot followed by a space and digit.您可以排除匹配点,并且仅匹配大写字符后跟点或点后跟空格和数字的点。
(?:(?<=\.\s)|^)[A-Z][^.A-Z]*(?:(?:[A-Z]\.|\.\s\d)[^.A-Z]*)*\bnice one\b.+?(?=\s[A-Z])
(?:(?<=\.\s)|^)
Assert a .
(?:(?<=\.\s)|^)
断言.
and whitespace char to the left or the start of the string[AZ][^.AZ]*
Match an uppercase char AZ and 0+ times any char except a dot or uppercase char [AZ][^.AZ]*
匹配大写字符 AZ 和 0+ 次除点或大写字符外的任何字符(?:
Non capture group (?:
非捕获组
(?:[AZ]\.|\.\s\d)
Match either AZ and .
(?:[AZ]\.|\.\s\d)
匹配 AZ 和.
or match .
.
whitespace char and digit[^.AZ]*
Optionally match any char except a .
[^.AZ]*
可选匹配除 a 之外的任何字符.
or uppercase char)*
Close group and optionally repeat )*
关闭组并可选择重复\bnice one\b.+?(?=\s[AZ])
Match nice one
and match until asserting a whitspace char and uppercase char to the right \bnice one\b.+?(?=\s[AZ])
匹配nice one
并匹配,直到在右边断言一个空白字符和大写字符
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.