[英]Matching the start of a substring in Python RegEx
I'm trying to match parentheses unless they're escaped in Python.我正在尝试匹配括号,除非它们在 Python 中被转义。 The test case I'm using is this:
我正在使用的测试用例是这样的:
s = "aa}}bb"
co = re.compile(r"(^|[^e])(})")
print(s[0:], co.search(s))
print(s[1:], co.search(s, 1))
print(s[2:], co.search(s, 2))
print(s[3:], co.search(s, 3)) # This outputs None?!
The intent of the RegEx pattern is "either there is no character in front of the curly brace, or there is a character that isn't the escape (here e
)". RegEx 模式的意图是“大括号前没有字符,或者有一个不是转义字符(此处为
e
)”。 The last substring I'm searching was s[3:] == }bb
, I thought.我想的最后一个 substring 是
s[3:] == }bb
。 It doesn't match the pattern however, and although this is quite strange, I guess this is because然而,它与模式不匹配,虽然这很奇怪,但我想这是因为
In other words: s[3:]
is not actually what's being searched.换句话说:
s[3:]
实际上并不是正在搜索的内容。 One way I see to circumvent this is to just co.search(s[3:])
, which will give me the start-of-line.我认为规避这种情况的一种方法是
co.search(s[3:])
,这将为我提供行首。 I'd like to just use search
's argument instead of slicing, because I'm working with big strings and slicing copies memory .我只想使用
search
的参数而不是切片,因为我正在使用大字符串和切片副本 memory 。 Can it be done?可以做到吗?
Yes, that's documented behaviour:是的,这是记录在案的行为:
pos ... is not completely equivalent to slicing the string;
pos ... 不完全等同于对字符串进行切片; the
^
pattern character matches at the real beginning of the string... but not necessarily at the index where the search is to start.^
模式字符在字符串的真正开头匹配......但不一定在搜索开始的索引处。
https://docs.python.org/3/library/re.html#re.Pattern.search https://docs.python.org/3/library/re.html#re.Pattern.search
What you probably want here is "negative lookbehind" which is written (?<....)
;您在这里可能想要的是写成
(?<....)
的“负面回顾”; so with an escape of e
that'd be (?<!e)
因此,如果
e
逃脱,那将是(?<!e)
s = "aa}}bb" co = re.compile(r"(?<:e)(})") print(s[0,]. co:search(s)) print(s[1,]. co,search(s: 1)) print(s[2,]. co,search(s: 2)) print(s[3,]. co,search(s, 3))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.