[英]How to capture unknown number of words in between a negative lookbehind catch term regex?
I am trying to exclude records which have the word "owner" somewhere preceding the word "dog"我正在尝试排除在“狗”一词之前某处带有“所有者”一词的记录
Here is current regex:这是当前的正则表达式:
\b(?<!owner\s)\w+\sdog\b
This works for a single unknown word ('owner has dog' is excluded but 'owner has a dog' is included)), however, I am unable to capture multiple words which retain its negative look behind across all words between "owner" and "dog".这适用于单个未知单词(不包括“所有者有狗”,但包括“所有者有狗”)),但是,我无法捕获多个单词,这些单词在“所有者”和“狗”。
Many Thanks非常感谢
You can use the following regular expression to verify that the string contains the word "dog" that is not preceded by the word "owner".您可以使用以下正则表达式来验证字符串是否包含单词“dog”,而该单词前面没有单词“owner”。
^(?:(?!\bowner\b).)*\bdog\b
Start your engine!启动你的引擎! < ¯\ (ツ) /¯ > Python code
< ¯\ (ツ) /¯ > Python码
Python's regex engine performs the following operations. Python 的正则表达式引擎执行以下操作。
^ : anchor match to beginning of string
(?: : begin a non-capture group
(?!\bowner\b) : use a negative lookahead to assert that the current
position in the string is not followed by "owner"
. : match a character
) : end non-capture group
* : execute non-capture group 0+ times
\bdog\b : match 'dog' surrounded by word boundaries
The technique of matching a sequence of individual characters that do not begin an outlawed word is called Tempered Greedy Token Solution .匹配不以非法词开头的单个字符序列的技术称为Tempered Greedy Token Solution 。
Another option could be to start matching any char except o
or a newline.另一种选择可能是开始匹配除
o
或换行符之外的任何字符。
Then in case you encounter an o, assert that it is not the word owner followed by matching any char except an o or a newline and optionally repeat that process until you match the word dog.然后,如果遇到 o,请断言它不是单词owner ,然后匹配除 o 或换行符之外的任何字符,并可选择重复该过程,直到匹配单词 dog。
^[^o\r\n]*(?:(?!\bowner\b)o[^o\r\n]*)*\bdog\b
Explanation解释
^
Start of string ^
字符串开头[^o\r\n]*
Match 0+ times any char except o or a newline [^o\r\n]*
匹配除 o 或换行符以外的任何字符 0+ 次(?:
Non capture group (?:
非捕获组
(?!\bowner\b)
Negative lookahead, assert not the word owner directly to the right (?!\bowner\b)
负前瞻,不直接在右边断言单词 ownero[^o\r\n]*
Match o followed by 0+ times any char except o or newline o[^o\r\n]*
匹配 o 后跟 0+ 次除 o 或换行符之外的任何字符)*
Close non capturing group and repeat 0+ times )*
关闭非捕获组并重复 0+ 次\bdog\b
Match the word dog \bdog\b
匹配单词 dog
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.