匹配最小可能的句子

Question

Text:文本：

One sentence here, much wow. Another one here. This is O.N.E. example n. 1, a nice one to understand. Hope it's clear now!

Regex: (?<=\.\s)[AZ].+?nice one.+?\.(?=\s[AZ])正则表达式： (?<=\.\s)[AZ].+?nice one.+?\.(?=\s[AZ])

Result: Another one here. This is ONE example n. 1, a nice one to understand.结果： Another one here. This is ONE example n. 1, a nice one to understand. Another one here. This is ONE example n. 1, a nice one to understand.

How can I do to obtain This is ONE example among n. 1, a nice one to understand.我该怎么做才能获得This is ONE example among n. 1, a nice one to understand. This is ONE example among n. 1, a nice one to understand. ? ? (ie the smallest possible sentence that matches the regex) （即与正则表达式匹配的最小可能句子）

Answer 1

Just insert a greedy .* in front of the expression只需在表达式前面插入一个贪婪的.*

.*\.\s([A-Z].+?nice one.+?\.(?=\s[A-Z]))

Answer 2

Here is a little bit of a different approach just splitting the entire text and then filtering out what you are after:这是一种不同的方法，只是拆分整个文本，然后过滤掉您所追求的内容：

import re
s = "One sentence here, much wow. Another one here. This is O.N.E. example n. 1, a nice one to understand. Hope it's clear now!"
result = [x for x in re.split(r'(?<=\B.\.)\s*',s) if 'nice one' in x][0]
print(result) # This is O.N.E. example n. 1, a nice one to understand.

Not sure how many edge-cases you have got but here I used re.split() with the following pattern: (?<=\B.\.)\s* .不确定你有多少边缘情况，但在这里我使用了re.split()和以下模式： (?<=\B.\.)\s* 。 This would mean:这意味着：

(?<=\B.\.) - A positive lookbehind to assert position is after a position where \b (a word-boundary) does not apply, followed by a literal dot. (?<=\B.\.) - 断言 position 在 position 之后的肯定回溯，其中\b （字边界）不适用，后跟文字点。
\s* - 0+ Whitespace characters. \s* - 0+ 个空白字符。

With the resulting array it won't be too much problem to check which element is holding your desired words "nice one".使用生成的数组，检查哪个元素包含您想要的单词“nice one”不会有太大问题。

See an online demo查看在线演示

Answer 3

You could exclude matching a dot, and only match the dot incase of an uppercase char followed by a dot, or a dot followed by a space and digit.您可以排除匹配点，并且仅匹配大写字符后跟点或点后跟空格和数字的点。

(?:(?<=\.\s)|^)[A-Z][^.A-Z]*(?:(?:[A-Z]\.|\.\s\d)[^.A-Z]*)*\bnice one\b.+?(?=\s[A-Z])

(?:(?<=\.\s)|^) Assert a . (?:(?<=\.\s)|^)断言. and whitespace char to the left or the start of the string和左边的空白字符或字符串的开头
[AZ][^.AZ]* Match an uppercase char AZ and 0+ times any char except a dot or uppercase char [AZ][^.AZ]*匹配大写字符 AZ 和 0+ 次除点或大写字符外的任何字符
(?: Non capture group (?:非捕获组
- (?:[AZ]\.|\.\s\d) Match either AZ and . (?:[AZ]\.|\.\s\d)匹配 AZ 和. or match .或匹配. whitespace char and digit空格字符和数字
- [^.AZ]* Optionally match any char except a . [^.AZ]*可选匹配除 a 之外的任何字符. or uppercase char或大写字符
)* Close group and optionally repeat )*关闭组并可选择重复
\bnice one\b.+?(?=\s[AZ]) Match nice one and match until asserting a whitspace char and uppercase char to the right \bnice one\b.+?(?=\s[AZ])匹配nice one并匹配，直到在右边断言一个空白字符和大写字符

Regex demo正则表达式演示

匹配最小可能的句子

问题描述

3 个解决方案

解决方案1
2 2021-05-03 12:57:18

解决方案2
2 2021-05-03 13:01:27

解决方案3
1 已采纳 2021-05-03 13:00:23

匹配最小可能的句子

问题描述

3 个解决方案

解决方案1 2 2021-05-03 12:57:18

解决方案2 2 2021-05-03 13:01:27

解决方案3 1 已采纳 2021-05-03 13:00:23

解决方案1
2 2021-05-03 12:57:18

解决方案2
2 2021-05-03 13:01:27

解决方案3
1 已采纳 2021-05-03 13:00:23