简体   繁体   中英

Regular expression for not splitting string if inside single or double quotes

I have a regular expression with the following pattern in C#

Regex param = new Regex(@"^-|^/|=|:");

Basically, its for command line parsing.

If I pass the below cmd line args it spilts C: as well.

/Data:SomeData /File:"C:\Somelocation"

How do I make it to not apply to characters inside double or single quotes ?

You can do this in two steps:

Use the first regex

Regex args = new Regex("[/-](?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

to split the string into the different arguments. Then use the regex

Regex param = new Regex("[=:](?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

to split each of the arguments into parameter/value pairs.

Explanation:

[=:]      # Split on this regex...
(?=       # ...only if the following matches afterwards:
 (?:      # The following group...
  [^"]*"  #  any number of non-quote character, then one quote
  [^"]*"  #  repeat, to ensure even number of quotes
 )*       # ...repeated any number of times, including zero,
 [^"]*    # followed by any number of non-quotes
 $        # until the end of the string.
)         # End of lookahead.

Basically, it looks ahead in the string if there is an even number of quotes ahead. If there is, we're outside of a string. However, this (somewhat manageable) regex only handles double quotes, and only if there are no escaped quotes inside those.

The following regex handles single and double quotes, including escaped quotes, correctly. But I guess you'll agree that if anybody ever finds this in production code, I'm guaranteed a feature article on The Daily WTF :

Regex param = new Regex(
    @"[=:]
    (?=      # Assert even number of (relevant) single quotes, looking ahead:
     (?:
      (?:\\.|""(?:\\.|[^""\\])*""|[^\\'""])*
      '
      (?:\\.|""(?:\\.|[^""'\\])*""|[^\\'])*
      '
     )*
     (?:\\.|""(?:\\.|[^""\\])*""|[^\\'])*
     $
    )
    (?=      # Assert even number of (relevant) double quotes, looking ahead:
     (?:
      (?:\\.|'(?:\\.|[^'\\])*'|[^\\'""])*
      ""
      (?:\\.|'(?:\\.|[^'""\\])*'|[^\\""])*
      ""
     )*
     (?:\\.|'(?:\\.|[^'\\])*'|[^\\""])*
     $
    )", 
    RegexOptions.IgnorePatternWhitespace);

Further explanation of this monster here .

You should read " Mastering Regular Expressions " to understand why there's no general solution to your question. Regexes cannot handle that to an arbitrary depth. As soon as you start to escape the escape character or to escape the escaping of the escape character or ... you're lost. Your use case needs a parser and not a regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM