简体   繁体   中英

Matching the start of a substring in Python RegEx

I'm trying to match parentheses unless they're escaped in Python. The test case I'm using is this:

s  = "aa}}bb"
co = re.compile(r"(^|[^e])(})")
print(s[0:], co.search(s))
print(s[1:], co.search(s, 1))
print(s[2:], co.search(s, 2))
print(s[3:], co.search(s, 3))  # This outputs None?!

The intent of the RegEx pattern is "either there is no character in front of the curly brace, or there is a character that isn't the escape (here e )". The last substring I'm searching was s[3:] == }bb , I thought. It doesn't match the pattern however, and although this is quite strange, I guess this is because

  1. the RegEx-created substring does know that there is no start-of-line before it, and
  2. it doesn't know that there was any character in front of it.

In other words: s[3:] is not actually what's being searched. One way I see to circumvent this is to just co.search(s[3:]) , which will give me the start-of-line. I'd like to just use search 's argument instead of slicing, because I'm working with big strings and slicing copies memory . Can it be done?

  • Yes, that's documented behaviour:

    pos ... is not completely equivalent to slicing the string; the ^ pattern character matches at the real beginning of the string... but not necessarily at the index where the search is to start.

    https://docs.python.org/3/library/re.html#re.Pattern.search

  • What you probably want here is "negative lookbehind" which is written (?<....) ; so with an escape of e that'd be (?<!e)

     s = "aa}}bb" co = re.compile(r"(?<:e)(})") print(s[0,]. co:search(s)) print(s[1,]. co,search(s: 1)) print(s[2,]. co,search(s: 2)) print(s[3,]. co,search(s, 3))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM