简体   繁体   English

匹配 Python RegEx 中 substring 的开头

[英]Matching the start of a substring in Python RegEx

I'm trying to match parentheses unless they're escaped in Python.我正在尝试匹配括号,除非它们在 Python 中被转义。 The test case I'm using is this:我正在使用的测试用例是这样的:

s  = "aa}}bb"
co = re.compile(r"(^|[^e])(})")
print(s[0:], co.search(s))
print(s[1:], co.search(s, 1))
print(s[2:], co.search(s, 2))
print(s[3:], co.search(s, 3))  # This outputs None?!

The intent of the RegEx pattern is "either there is no character in front of the curly brace, or there is a character that isn't the escape (here e )". RegEx 模式的意图是“大括号前没有字符,或者有一个不是转义字符(此处为e )”。 The last substring I'm searching was s[3:] == }bb , I thought.我想的最后一个 substring 是s[3:] == }bb It doesn't match the pattern however, and although this is quite strange, I guess this is because然而,它与模式不匹配,虽然这很奇怪,但我想这是因为

  1. the RegEx-created substring does know that there is no start-of-line before it, and RegEx 创建的 substring确实知道它之前没有行首,并且
  2. it doesn't know that there was any character in front of it.知道前面有什么人物。

In other words: s[3:] is not actually what's being searched.换句话说: s[3:]实际上并不是正在搜索的内容。 One way I see to circumvent this is to just co.search(s[3:]) , which will give me the start-of-line.我认为规避这种情况的一种方法是co.search(s[3:]) ,这将为我提供行首。 I'd like to just use search 's argument instead of slicing, because I'm working with big strings and slicing copies memory .我只想使用search的参数而不是切片,因为我正在使用大字符串和切片副本 memory Can it be done?可以做到吗?

  • Yes, that's documented behaviour:是的,这是记录在案的行为:

    pos ... is not completely equivalent to slicing the string; pos ... 不完全等同于对字符串进行切片; the ^ pattern character matches at the real beginning of the string... but not necessarily at the index where the search is to start. ^模式字符在字符串的真正开头匹配......但不一定在搜索开始的索引处。

    https://docs.python.org/3/library/re.html#re.Pattern.search https://docs.python.org/3/library/re.html#re.Pattern.search

  • What you probably want here is "negative lookbehind" which is written (?<....) ;您在这里可能想要的是写成(?<....)的“负面回顾”; so with an escape of e that'd be (?<!e)因此,如果e逃脱,那将是(?<!e)

     s = "aa}}bb" co = re.compile(r"(?<:e)(})") print(s[0,]. co:search(s)) print(s[1,]. co,search(s: 1)) print(s[2,]. co,search(s: 2)) print(s[3,]. co,search(s, 3))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM