如何在否定的lookbehind catch term regex之间捕获未知数量的单词？

Question

I am trying to exclude records which have the word "owner" somewhere preceding the word "dog"我正在尝试排除在“狗”一词之前某处带有“所有者”一词的记录

the owner has a dog (exclude)主人有一只狗（不包括）
the owner has a black and brown dog (exclude)主人有一只黑色和棕色的狗（不包括）
John has a dog (include)约翰有一条狗（包括）
John has a black and brown dog (include)约翰有一只黑色和棕色的狗（包括）

Here is current regex:这是当前的正则表达式：

\b(?<!owner\s)\w+\sdog\b

This works for a single unknown word ('owner has dog' is excluded but 'owner has a dog' is included)), however, I am unable to capture multiple words which retain its negative look behind across all words between "owner" and "dog".这适用于单个未知单词（不包括“所有者有狗”，但包括“所有者有狗”）），但是，我无法捕获多个单词，这些单词在“所有者”和“狗”。

Many Thanks非常感谢

Answer 1

You can use the following regular expression to verify that the string contains the word "dog" that is not preceded by the word "owner".您可以使用以下正则表达式来验证字符串是否包含单词“dog”，而该单词前面没有单词“owner”。

^(?:(?!\bowner\b).)*\bdog\b

Start your engine!启动你的引擎！ _{^< ¯\ (ツ) /¯ ^>} Python code _{^< ¯\ (ツ) /¯ ^>} Python码

Python's regex engine performs the following operations. Python 的正则表达式引擎执行以下操作。

^                : anchor match to beginning of string
(?:              : begin a non-capture group
  (?!\bowner\b)  : use a negative lookahead to assert that the current
                   position in the string is not followed by "owner"
  .              : match a character
)                : end non-capture group
*                : execute non-capture group 0+ times
\bdog\b          : match 'dog' surrounded by word boundaries

The technique of matching a sequence of individual characters that do not begin an outlawed word is called Tempered Greedy Token Solution .匹配不以非法词开头的单个字符序列的技术称为Tempered Greedy Token Solution 。

Answer 2

Another option could be to start matching any char except o or a newline.另一种选择可能是开始匹配除o或换行符之外的任何字符。

Then in case you encounter an o, assert that it is not the word owner followed by matching any char except an o or a newline and optionally repeat that process until you match the word dog.然后，如果遇到 o，请断言它不是单词owner ，然后匹配除 o 或换行符之外的任何字符，并可选择重复该过程，直到匹配单词 dog。

 ^[^o\r\n]*(?:(?!\bowner\b)o[^o\r\n]*)*\bdog\b

Explanation解释

^ Start of string ^字符串开头
[^o\r\n]* Match 0+ times any char except o or a newline [^o\r\n]*匹配除 o 或换行符以外的任何字符 0+ 次
(?: Non capture group (?:非捕获组
- (?!\bowner\b) Negative lookahead, assert not the word owner directly to the right (?!\bowner\b)负前瞻，不直接在右边断言单词 owner
- o[^o\r\n]* Match o followed by 0+ times any char except o or newline o[^o\r\n]*匹配 o 后跟 0+ 次除 o 或换行符之外的任何字符
)* Close non capturing group and repeat 0+ times )*关闭非捕获组并重复 0+ 次
\bdog\b Match the word dog \bdog\b匹配单词 dog

Regex demo |正则表达式演示| Python demo Python 演示

如何在否定的lookbehind catch term regex之间捕获未知数量的单词？

问题描述

2 个解决方案

解决方案1
3 2020-07-04 00:28:21

解决方案2
1 2020-07-05 11:57:07

如何在否定的lookbehind catch term regex之间捕获未知数量的单词？

问题描述

2 个解决方案

解决方案1 3 2020-07-04 00:28:21

解决方案2 1 2020-07-05 11:57:07

解决方案1
3 2020-07-04 00:28:21

解决方案2
1 2020-07-05 11:57:07