简体   繁体   English

Python RegEx - 负面前瞻后没有工作? 量词

[英]Python RegEx - Negative Lookahead not working after a ? quantifier

I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (ie "po | po | po") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. 我是regex的新手,我想找到所有“po”的实例和它的变体(即“po | po | po”),其后面没有“box”,因为我对采购订单感兴趣而不是邮政信箱。 The code below isn't working and just matches the po even when it's followed by a "box." 下面的代码不起作用,只是匹配po,即使它后跟一个“框”。 Any ideas? 有任何想法吗?

string = " po  pobox  po box  po  box    p.o.  p.o.box  p.o. box  p.o.  box"

re.findall(r' p\.?\s?o\.?(?!\s*box)', string)

//expected output
[' po', ' p.o.']

//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']

You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way. 您在可选模式之后放置了前瞻,并且回溯使得可以以另一种方式匹配字符串。

If Python supported possessive quantifiers, it would be easy to solve by adding + after the \\.? 如果Python支持占有量词,那么通过在\\.?后添加+可以很容易地解决\\.? that is before the lookahead: p\\.?\\s?o\\.?+(?!\\s*box) . 在前瞻之前: p\\.?\\s?o\\.?+(?!\\s*box) It would prevent the engine from backtracking into \\.? 它会阻止引擎回溯到\\.? pattern. 图案。

However, since Python re does not support them, you need to move the lookahead right after the o , obligatory part, and add \\.? 然而,因为Python re不支持他们,你需要经过移动先行权o ,强制性的部分,并添加\\.? to the lookahead: 前瞻:

r'p\.?\s?o(?!\.?\s*box)\.?'
          ^^^^^^^^^^^^^

See the regex demo . 请参阅正则表达式演示 Add \\b after box if you plan to match it as a whole word. 如果您打算将其作为整个单词匹配,请在box后面添加\\b Same with the first p , you may want to add a \\b before it to match p as a whole word. 与第一个p相同,您可能需要在它之前添加\\b以匹配p作为整个单词。

Details 细节

  • p - a p p - a p
  • \\.? - an optional (1 or 0) dots - 可选的(1或0)点
  • \\s? - an optional (1 or 0) whitespaces - 可选的(1或0)空格
  • o - an o o - o
  • (?!\\.?\\s*box) - a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces and box (?!\\.?\\s*box) - 如果在当前位置的右边有一个可选的点,0 +空格和box ,则表示匹配失败的负向前瞻
  • \\.? - an optional (1 or 0) dots - 可选的(1或0)点

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM