简体   繁体   English

正则表达式后面的多重正向

[英]Multiple Positive Lookbehind Regex

so I'm practicing my regex and I encounter this 所以我正在练习我的正则表达式,我遇到了这个

STATE :   TEXAS

im going for a positive lookbehind 我正在积极寻找背后

this is my regex: 这是我的正则表达式:

state = re.search(r"(?<=STATE)\s+(?<=:)\s+\w+",str(Text),re.I|re.M)

this regex fails to capture TEXAS 此正则表达式无法捕获TEXAS

however If I do this: 但是,如果我这样做:

state = re.search(r"(?<=STATE)\s+:\s+\w+",str(Text),re.I|re.M)

removing the second positive lookbehind will give you : TEXAS 删除后面的第二个正向外观将为您提供: TEXAS

however all I want to extract is TEXAS without the colon why does the second look behind fail to capture TEXAS ? 但是我要提取的是没有冒号的TEXAS ,为什么后面的第二张图无法捕获TEXAS and how can it be fixed? 以及如何解决?

Think about this part of your pattern: 考虑一下模式的这一部分:

(?<=STATE)\s+(?<=:)

The first lookbehind says to find a place with "STATE" right before it. 后面的第一眼说要在其前面找到一个带有“ STATE”的地方。 The \\s+ says to match some whitespace. \\s+表示要匹配一些空格。 The second lookbehind says to look behind (at what you have just matched) and find a colon. 后面的第二个后视表示要向后看(以您刚刚匹配的内容)并找到冒号。 That's impossible, because all you've matched is spaces. 这是不可能的,因为您所匹配的只是空格。 You can't look back and find a colon without consuming it during the match. 您不能回头寻找比赛期间要消耗的冒号。

A lookbehind in the middle of your expression doesn't mean "skip ahead until you get past this part". 表达式中间的后面并不意味着“向前跳过,直到您通过此部分”。 It means to look back over what has already been matched and see if it matches the lookbehind expression. 这意味着要回顾已匹配的内容,并查看其是否与lookbehind表达式匹配。 It can only match against stuff that has already been consumed (unless it's at the beginning of your regex, in which case it will control where the match begins), 它只能与已经消耗的东西进行匹配(除非它在正则表达式的开头,在这种情况下它将控制匹配的开始位置),

If you just want to get "TEXAS", you should capture it in a group and then extract the group after doing the match: 如果您只想获取“ TEXAS”,则应将其捕获到一个组中,然后在进行匹配后提取该组:

>>> data = "STATE :   TEXAS"
>>> re.search("STATE\s+:\s+(\w+)", data).group(1)
'TEXAS'

Don't use lookahead/lookbehind; 不要使用向前/向后看; use groups instead. 使用组代替。 (I really wish someone had told me this when I first learnt regex!): (我真的希望当我第一次学习正则表达式时有人告诉我这个!):

re.search('STATE\s+:\s+(\w+)', "STATE :   TEXAS").group(1)
Out[145]: 'TEXAS'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM