简体   繁体   English

Python正则表达式-在子字符串中查找多个字符

[英]Python regex - Find multiple characters in a substring

print 'cycle' ;While i in range(1,n) [[print "Number:" ;print i; print 'and,']]

I have a line like this for example. 例如,我有这样一条线。 I want to extract the semicolon characters only from the [[ ... ]] substring, inside the double square brackets. 我只想从双括号内的[[...]]子字符串中提取分号字符。

If I use re.search(\\[\\[.*(\\s*;).*\\]\\]) I get only one semicolon. 如果我使用re.search(\\[\\[.*(\\s*;).*\\]\\])我只会得到一个分号。 Is there a proper solution for this? 有合适的解决方案吗?

Regex is never a great choice for things like this because it's very easy to trip up, but the following pattern works in trivial cases : 正则表达式从来都不是此类事情的理想选择,因为它很容易绊倒,但是以下模式在平凡的情况下仍然有效:

;(?=(?:(?!\[\[).)*\]\])

Pattern breakdown: 模式细分:

;                # match literal ";"
(?=              # lookahead assertion: assert the following pattern matches:
    (?:          
        (?!\[\[) # as long as we don't find a "[["...
        .        # ...consume the next character
    )*           # ...as often as necessary
    \]\]         # until we find "]]"
)

In other words, the pattern checks if a semicolon is followed by ]] , but not followed by [[ . 换句话说,该模式检查是否在分号后跟]] ,而不是[[


Examples of strings where the pattern won't work: 模式不起作用的字符串示例:

  • ; ]] ; ]] (will match) ; ]] (将匹配)
  • [[ ; "this is text [[" ]] [[ ; "this is text [[" ]] (won't match) [[ ; "this is text [[" ]] (不匹配)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM