简体   繁体   中英

What does [^] match in C# regex?

I'm looking at someone else's regex... I can make out I'm dealing with a positive lookbehind, but I'm not sure what it's supposed to match: (?<=[^])\\t{2,}|(?<=[>]) .

I know [stuff] matches any character among s , t , u , and f . And I know [^stuff] matches any character not among those.

But what does [^] mean? I guess it could mean "anything not of length zero", ie "anything". But why wouldn't one just use some expansion on the simple . expression (to also capture newlines)?

Update:

Per Wikter's comment, [^] alone isn't valid. But that still leaves me wondering what this thing is supposed to do...

To me, an intuitive reading is...

(?<=[^]) - look behind for whatever [^] matches

\\t{2,} - then find two or more tabs

| - if there's not a match for that...

(?<=[>]) - ...look behind for a > character.

Where is my interpretation missing the mark?

The [^] does not match anything since it is an invalid pattern. It is not even tried at all, it fails at the parsing stage. The [^>] , on the other hand, is a negated character class that matches any char but > .

The [^] is an invalid pattern in the majority of regex flavors other than ECMAScript. It will throw Unterminated [] set exception in .NET.

To match any char, use (?s:.) (a . pattern with RegexOptions.Singleline option).

The (?<=[^])\\t{2,}|(?<=[>]) pattern represents a single positive lookbehind that matches a location that is immediately preceded with [^])\\t{2,}|(?<=[>] pattern, which is a negated character class matching any single char but ] , ) , tab, { , 2 , , , } , | , ( , ? , < , = , [ , > . All the chars from the [^ to the last ] are "negated" because the first ] after ^ is considered a literal ] symbol.

You may see the regex demo here where it matches a location after S :

在此处输入图片说明

Basically, you need to always watch out for characters that are not word chars, and to play it safe, you may escape all non-word chars.

Inside a character class , there are only 4 chars that are "special":

^
]
\
-

If you want to avoid misunderstanding, always escape them.

If you want to show off before you boss/customer, note that you do not have to escape them if...

  • - : if it appears at the end/start of the character class, or between a char and a valid range/shorthand character class, and if it is not part of a character class subtraction construct
  • ] : if it appears right at the beginning of the character class AND it is not the only char in the character class
  • ^ - if it is not the first char in the positive character class.

And \\ must always be escaped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM