简体   繁体   English

正则表达式,如果在单引号或双引号内,则不分割字符串

[英]Regular expression for not splitting string if inside single or double quotes

I have a regular expression with the following pattern in C# 我在C#中有一个带有以下模式的正则表达式

Regex param = new Regex(@"^-|^/|=|:");

Basically, its for command line parsing. 基本上,它用于命令行解析。

If I pass the below cmd line args it spilts C: as well. 如果我通过下面的cmd线args,它也会传递给C:

/Data:SomeData /File:"C:\Somelocation"

How do I make it to not apply to characters inside double or single quotes ? 如何使其不适用于双引号或单引号内的字符?

You can do this in two steps: 您可以分两步完成此操作:

Use the first regex 使用第一个正则表达式

Regex args = new Regex("[/-](?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

to split the string into the different arguments. 将字符串拆分为不同的参数。 Then use the regex 然后使用正则表达式

Regex param = new Regex("[=:](?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

to split each of the arguments into parameter/value pairs. 将每个参数拆分为参数/值对。

Explanation: 说明:

[=:]      # Split on this regex...
(?=       # ...only if the following matches afterwards:
 (?:      # The following group...
  [^"]*"  #  any number of non-quote character, then one quote
  [^"]*"  #  repeat, to ensure even number of quotes
 )*       # ...repeated any number of times, including zero,
 [^"]*    # followed by any number of non-quotes
 $        # until the end of the string.
)         # End of lookahead.

Basically, it looks ahead in the string if there is an even number of quotes ahead. 基本上,如果前面有偶数引号,它会在字符串中向前看。 If there is, we're outside of a string. 如果有,我们就在一个字符串之外。 However, this (somewhat manageable) regex only handles double quotes, and only if there are no escaped quotes inside those. 但是,这个(有些可管理的)正则表达式只处理双引号,并且只有在那些内部没有转义引号的情况下。

The following regex handles single and double quotes, including escaped quotes, correctly. 以下正则表达式正确处理单引号和双引号,包括转义引号。 But I guess you'll agree that if anybody ever finds this in production code, I'm guaranteed a feature article on The Daily WTF : 但我想你会同意,如果有人在生产代码中找到这个,我保证有一篇关于The Daily WTF的专题文章:

Regex param = new Regex(
    @"[=:]
    (?=      # Assert even number of (relevant) single quotes, looking ahead:
     (?:
      (?:\\.|""(?:\\.|[^""\\])*""|[^\\'""])*
      '
      (?:\\.|""(?:\\.|[^""'\\])*""|[^\\'])*
      '
     )*
     (?:\\.|""(?:\\.|[^""\\])*""|[^\\'])*
     $
    )
    (?=      # Assert even number of (relevant) double quotes, looking ahead:
     (?:
      (?:\\.|'(?:\\.|[^'\\])*'|[^\\'""])*
      ""
      (?:\\.|'(?:\\.|[^'""\\])*'|[^\\""])*
      ""
     )*
     (?:\\.|'(?:\\.|[^'\\])*'|[^\\""])*
     $
    )", 
    RegexOptions.IgnorePatternWhitespace);

Further explanation of this monster here . 在这里进一步解释这个怪物。

You should read " Mastering Regular Expressions " to understand why there's no general solution to your question. 您应该阅读“ 掌握正则表达式 ”以了解为什么没有针对您的问题的一般解决方案。 Regexes cannot handle that to an arbitrary depth. 正则表达式无法处理任意深度。 As soon as you start to escape the escape character or to escape the escaping of the escape character or ... you're lost. 一旦你开始逃脱逃脱角色或逃脱逃脱角色的逃脱或......你就迷失了。 Your use case needs a parser and not a regex. 您的用例需要解析器而不是正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM