[英]Regex match word not immediately preceded by another word but possibly preceded by that word before
I need to match all strings that contain one word of a list, but only if that word is not immediately preceded by another specific word.我需要匹配包含列表中一个单词的所有字符串,但前提是该单词之前没有紧跟另一个特定单词。 I have this regex:
我有这个正则表达式:
.*(?<!forbidden)\b(word1|word2|word3)\b.*
that is still matching a sentence like hello forbidden word1
because forbidden
is matched by .*
.那仍然匹配一个句子,比如
hello forbidden word1
因为forbidden
匹配.*
。 But if I remove the .*
I am not anymore matching strings like hello word1
, which I want to match.但是,如果我删除
.*
我不再匹配我想要匹配的hello word1
之类的字符串。
Note that I want to match a string like forbidden hello word1
.请注意,我想匹配一个像
forbidden hello word1
这样的字符串。
Could you suggest me how to fix this problem?你能建议我如何解决这个问题吗?
This one seems to work well :这个似乎运作良好:
^.*\b(?!(?:forbidden|word[1-3])\b)\w+ (word[1-3]).*$
\b(?!(?:forbidden|word[1-3])\b)\w+
checks for multiple following words that are not forbidden
or word[1-3]
. \b(?!(?:forbidden|word[1-3])\b)\w+
检查后面的多个未被forbidden
的单词或word[1-3]
。
So it matches hi forbidden hello word1 test
but not hi hello forbidden word2 test
.所以它匹配
hi forbidden hello word1 test
但不匹配 hi hi hello forbidden word2 test
。
If what you want is match entire string.如果你想要的是匹配整个字符串。 Try this:
尝试这个:
^(.(?<!forbidden (word1|word2|word3)\b))*((?<!forbidden )\b(word1|word2|word3)\b)+(.(?<!forbidden (word1|word2|word3)\b))*$
The knowledge is from this thread Regular expression to match a line that doesn't contain a word知识来自this thread 正则表达式匹配不包含单词的行
I've just reversed the order of look-around我刚刚颠倒了环顾的顺序
^(.(?<!forbidden (word1|word2|word3)\b))*
to discard any string that has pattern forbidden (word1|word2|word3)
^(.(?<!forbidden (word1|word2|word3)\b))*
丢弃任何具有forbidden (word1|word2|word3)
((?<!forbidden )\b(word1|word2|word3)\b)
is what you defined ((?<!forbidden )\b(word1|word2|word3)\b)
是你定义的
But I just can't understand why do you need this requirement.但我就是不明白你为什么需要这个要求。
Have a look into word boundaries \bword
can never touch a word character to the left.查看单词边界
\bword
永远不会触及左侧的单词字符。
To disallow (word1|word2|word3)
if not preceded by forbidden
and禁止
(word1|word2|word3)
如果前面没有forbidden
和
one \W
( non word character )一个
\W
( 非单词字符)
^.*?\b(?<!forbidden\W)(word1|word2|word3)\b.*
multiple \W
多个
\W
Lookbehinds need to be of fixed length in Python regex.在 Python 正则表达式中,Lookbehinds 的长度必须是固定的。 To get around this, an idea is to use
\W*
outside preceded by (?<!\W)
for setting the position to look behind.为了解决这个问题,一个想法是在
(?<!\W)
前面使用\W*
来设置向后看的位置。
^.*?(?<!forbidden)(?<!\W)\W*\b(word1|word2|word3)\b.*
Regex101 demo (in multiline demo I used [^\w\n]
instead \W
for not skipping over lines) Regex101 演示(在多行演示中,我使用
[^\w\n]
而不是\W
来不跳过行)
Certainly variable-width lookbehind, such as (?<!forbidden\W+)
would be more comfortable.当然,可变宽度的后视,例如
(?<!forbidden\W+)
会更舒服。 PyPI Regex > import regex AS re
supports lookbehind of variable length: See this demo PyPI Regex >
import regex AS re
支持后视可变长度: 请参阅此演示
Note : If you do not capture anything, a (?:
non-capturing groups can be used as well.注意:如果您不捕获任何内容,也可以使用
(?:
非捕获组。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.