简体   繁体   English

如何在否定的lookbehind catch term regex之间捕获未知数量的单词?

[英]How to capture unknown number of words in between a negative lookbehind catch term regex?

I am trying to exclude records which have the word "owner" somewhere preceding the word "dog"我正在尝试排除在“狗”一词之前某处带有“所有者”一词的记录

  • the owner has a dog (exclude)主人有一只狗(不包括)
  • the owner has a black and brown dog (exclude)主人有一只黑色和棕色的狗(不包括)
  • John has a dog (include)约翰有一条狗(包括)
  • John has a black and brown dog (include)约翰有一只黑色和棕色的狗(包括)

Here is current regex:这是当前的正则表达式:

\b(?<!owner\s)\w+\sdog\b

This works for a single unknown word ('owner has dog' is excluded but 'owner has a dog' is included)), however, I am unable to capture multiple words which retain its negative look behind across all words between "owner" and "dog".这适用于单个未知单词(不包括“所有者狗”,但包括“所有者狗”)),但是,我无法捕获多个单词,这些单词在“所有者”和“狗”。

Many Thanks非常感谢

You can use the following regular expression to verify that the string contains the word "dog" that is not preceded by the word "owner".您可以使用以下正则表达式来验证字符串是否包含单词“dog”,而该单词前面没有单词“owner”。

^(?:(?!\bowner\b).)*\bdog\b

Start your engine!启动你的引擎! < ¯\ (ツ)> Python code < ¯\ (ツ)> Python码

Python's regex engine performs the following operations. Python 的正则表达式引擎执行以下操作。

^                : anchor match to beginning of string
(?:              : begin a non-capture group
  (?!\bowner\b)  : use a negative lookahead to assert that the current
                   position in the string is not followed by "owner"
  .              : match a character
)                : end non-capture group
*                : execute non-capture group 0+ times
\bdog\b          : match 'dog' surrounded by word boundaries

The technique of matching a sequence of individual characters that do not begin an outlawed word is called Tempered Greedy Token Solution .匹配不以非法词开头的单个字符序列的技术称为Tempered Greedy Token Solution

Another option could be to start matching any char except o or a newline.另一种选择可能是开始匹配除o或换行符之外的任何字符。

Then in case you encounter an o, assert that it is not the word owner followed by matching any char except an o or a newline and optionally repeat that process until you match the word dog.然后,如果遇到 o,请断言它不是单词owner ,然后匹配除 o 或换行符之外的任何字符,并可选择重复该过程,直到匹配单词 dog。

 ^[^o\r\n]*(?:(?!\bowner\b)o[^o\r\n]*)*\bdog\b

Explanation解释

  • ^ Start of string ^字符串开头
  • [^o\r\n]* Match 0+ times any char except o or a newline [^o\r\n]*匹配除 o 或换行符以外的任何字符 0+ 次
  • (?: Non capture group (?:非捕获组
    • (?!\bowner\b) Negative lookahead, assert not the word owner directly to the right (?!\bowner\b)负前瞻,不直接在右边断言单词 owner
    • o[^o\r\n]* Match o followed by 0+ times any char except o or newline o[^o\r\n]*匹配 o 后跟 0+ 次除 o 或换行符之外的任何字符
  • )* Close non capturing group and repeat 0+ times )*关闭非捕获组并重复 0+ 次
  • \bdog\b Match the word dog \bdog\b匹配单词 dog

Regex demo |正则表达式演示| Python demo Python 演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM