I'm looking at someone else's regex... I can make out I'm dealing with a positive lookbehind, but I'm not sure what it's supposed to match: (?<=[^])\\t{2,}|(?<=[>])
.
I know [stuff]
matches any character among s , t , u , and f . And I know [^stuff]
matches any character not among those.
But what does [^]
mean? I guess it could mean "anything not of length zero", ie "anything". But why wouldn't one just use some expansion on the simple .
expression (to also capture newlines)?
Update:
Per Wikter's comment, [^]
alone isn't valid. But that still leaves me wondering what this thing is supposed to do...
To me, an intuitive reading is...
(?<=[^])
- look behind for whatever [^]
matches
\\t{2,}
- then find two or more tabs
|
- if there's not a match for that...
(?<=[>])
- ...look behind for a >
character.
Where is my interpretation missing the mark?
The [^]
does not match anything since it is an invalid pattern. It is not even tried at all, it fails at the parsing stage. The [^>]
, on the other hand, is a negated character class that matches any char but >
.
The [^]
is an invalid pattern in the majority of regex flavors other than ECMAScript. It will throw Unterminated [] set
exception in .NET.
To match any char, use (?s:.)
(a .
pattern with RegexOptions.Singleline
option).
The (?<=[^])\\t{2,}|(?<=[>])
pattern represents a single positive lookbehind that matches a location that is immediately preceded with [^])\\t{2,}|(?<=[>]
pattern, which is a negated character class matching any single char but ]
, )
, tab, {
, 2
, ,
, }
, |
, (
, ?
, <
, =
, [
, >
. All the chars from the [^
to the last ]
are "negated" because the first ]
after ^
is considered a literal ]
symbol.
You may see the regex demo here where it matches a location after S
:
Basically, you need to always watch out for characters that are not word chars, and to play it safe, you may escape all non-word chars.
Inside a character class , there are only 4 chars that are "special":
^
]
\
-
If you want to avoid misunderstanding, always escape them.
If you want to show off before you boss/customer, note that you do not have to escape them if...
-
: if it appears at the end/start of the character class, or between a char and a valid range/shorthand character class, and if it is not part of a character class subtraction construct ]
: if it appears right at the beginning of the character class AND it is not the only char in the character class ^
- if it is not the first char in the positive character class. And \\
must always be escaped.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.