简体   繁体   English

在原始字符串中获取函数参数值的最佳c#正则表达式模式?

[英]Best c# Regex pattern to get a function's parameters values in a raw string?

I'm parsing html code in a C# project. 我在C#项目中解析html代码。

Assuming that we have this string: 假设我们有以下字符串:

<a href="javascript:func('data1','data2'...)">...</a>

Or that after the necessary .subtring() 's this one: 或在必要的.subtring()之后的.subtring()

func('data1','data2'...)

What would be the best Regex pattern to retrieve func() 's parameters avoiding counting on delimiter characters (' and ,) as they could sometimes be part of the parameter's string? 什么是最好的正则Regex模式,以检索func()的参数而避免依靠分隔符('和'),因为它们有时可能是参数字符串的一部分?

You should not use regex to parse programming language code, because it's not a regular language. 不应该使用正则表达式来解析编程语言代码,因为它不是常规语言。 This article explains why: Can regular expressions be used to match nested patterns? 本文解释了原因: 可以使用正则表达式来匹配嵌套模式吗?


And to prove my point, allow me to share an actual solution with a regex that I think will match what you want: 为了证明我的观点,请允许我与正则表达式共享一个实际的解决方案,我认为它会满足您的要求:

^                               # Start of string
[^()'""]+\(                     # matches `func(`
                                #
(?>                             # START - Iterator (match each parameter)
 (?(param)\s*,(?>\s*))          # if it's not the 1st parameter, start with a `,`
 (?'param'                      # opens 'param' (main group, captures each parameter)
                                #
   (?>                          # Group: matches every char in parameter
      (?'qt'['""])              #  ALTERNATIVE 1: strings (matches ""foo"",'ba\'r','g)o\'o')
      (?:                       #   match anything inside quotes
        [^\\'""]+               #    any char except quotes or escapes
        |(?!\k'qt')['""]        #    or the quotes not used here (ie ""double'quotes"")
        |\\.                    #    or any escaped char
      )*                        #   repeat: *
      \k'qt'                    #   close quotes
   |  (?'parens'\()             #  ALTERNATIVE 2: `(` open nested parens (nested func)
   |  (?'-parens'\))            #  ALTERNATIVE 3: `)` close nested parens
   |  (?'braces'\{)             #  ALTERNATIVE 4: `{` open braces
   |  (?'-braces'})             #  ALTERNATIVE 5: `}` close braces
   |  [^,(){}\\'""]             #  ALTERNATIVE 6: anything else (var, funcName, operator, etc)
   |  (?(parens),)              #  ALTERNATIVE 7: `,` a comma if inside parens
   |  (?(braces),)              #  ALTERNATIVE 8: `,` a comma if inside braces
   )*                           # Repeat: *
                                # CONDITIONS:
  (?(parens)(?!))               #  a. balanced parens
  (?(braces)(?!))               #  b. balanced braces
  (?<!\s)                       #  c. no trailing spaces
                                #
 )                              # closes 'param'
)*                              # Repeat the whole thing once for every parameter
                                #
\s*\)\s*(?:;\s*)?               # matches `)` at the end if func(), maybe with a `;`
$                               # END

One-liner: 一内胆:

^[^()'""]+\((?>(?(param)\s*,(?>\s*))(?'param'(?>(?'qt'['""])(?:[^\\'""]+|(?!\k'qt')['""]|\\.)*\k'qt'|(?'parens'\()|(?'-parens'\))|(?'braces'\{)|(?'-braces'})|[^,(){}\\'""]|(?(parens),)|(?(braces),))*(?(parens)(?!))(?(braces)(?!))(?<!\s)))*\s*\)\s*(?:;\s*)?$

Test online 在线测试

As you can imagine by now (if you're still reading), even with an indented pattern and with comments for every construct, this regex is unreadable, quite difficult to mantain and almost impossible to debug... And I can guess there will be exceptions that would make it fail. 如您现在所能想象的(如果您仍在阅读中),即使使用缩进的模式并为每个结构添加注释,此正则表达式也不可读,难以维护且几乎无法调试...而且我猜想在那里是会使它失败的例外。

Just in case a stubborn mind is still interested, here's a link to the logic behind it: Matching Nested Constructs with Balancing Groups (regular-expressions.info) 以防万一顽固的头脑仍然感兴趣,这是其背后逻辑的链接:将嵌套构造与平衡组匹配(regular-expressions.info)

Try this 尝试这个

            string input = "<a href=\"javascript:func('data1','data2'...)\">...</a>";

            string pattern1 = @"\w+\((?'parameters'[^\)]+)\)";

            Regex expr1 = new Regex(pattern1);
            Match match1 = expr1.Match(input);
            string parameters = match1.Groups["parameters"].Value;

            string pattern2 = @"\w+";
            Regex expr2 = new Regex(pattern2);
            MatchCollection matches = expr2.Matches(parameters);

            List<string> results = new List<string>();
            foreach (Match match in matches)
            {
                results.Add(match.Value);
            }​

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM