简体   繁体   中英

Why '\A' in python regex doesn't work inside [ ]?

I was trying to get a regex which would match a word in the beginning of the line or after certain word. I tried:

r"[\A|my_word](smth)"

But it failed because it doesn't match the \\A in that case. What's wrong with that?

It turns out that \\A doesn't work inside [] :

In [163]: type(re.search(r"\A123", "123"))
Out[163]: <type '_sre.SRE_Match'>

In [164]: type(re.search(r"[\A]123", "123"))
Out[164]: <type 'NoneType'>

But I don't understand why.

I'm using Python 2.6.6

EDIT : After some comments I realized that the example I used with [\\A|my_word] is bad. The actual expression is [\\AV] to match either beginning of the string or V. The main problem I had is that I was curious why [\\A] doesn't work.

My understanding of backslashes in bracket character classes was off, it seems, but even so, it is the case that [\\A|my_word] is equivalent to [A|my_word] and will try to match a single one of A , | , m , y , _ , w , o , r , or d before smth .

Here's a regular expression that should do what you want; unfortunately, a lookbehind can't be used in Python due to \\A and my_word having different lengths, but a non-capturing group can be used instead: (?:\\A|abc)(smth) .

(You can also use ^ instead of \\A if you want, though the usage may differ in multiline mode as ^ will also match at the start of each new line [or rather, immediately after every newline] in that mode.)

The \\ character in the brackets clauses loses its special meaning as escaping character.

Ie in [ ] it will treat as two characters: \\ and A .

[REF]

Regex referencies:

The Single UNIX Specification

Python 2.6 - re module

UPDATE

Bracket expression is special case iteself, thus that special sequences like \\A (almost control commands for regex) will work there is very unlikely. It's somehow unnatural...

ONE MORE THING

As stated from Python reference:

(brackets) Used to indicate a set of characters.

\\A is special sequence which:

Matches only at the start of the string.

It is obviously not a character of any set, I know \\n NEWLINE, but I've never heard about STARTLINE (maybe pretty one).

Also, for escapists: You could even put ] into bracket without escaping it, if it comes right after the starting [ left bracket:

The pattern []] will match ']', for example.

[\\A] matches a single character that is either a \\ or an A . This is probably not what you wanted.

Anchors vs Character Classes

\\A is an anchor that matches a position in the string - in this case the position before the first char in the string. Other anchors are \\b : word boundary, ^ : start of string/line, $ : end of string/line, (?=...) : Positive lookahead, (?!...) : negative lookahead, etc. Anchors consume no characters and only match a position within the string.

[abc] is a character class that always matches exactly one character - in this case either a , b or c

Thus, placing an anchor inside a character class makes no sense.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM