简体   繁体   中英

Best c# Regex pattern to get a function's parameters values in a raw string?

I'm parsing html code in a C# project.

Assuming that we have this string:

<a href="javascript:func('data1','data2'...)">...</a>

Or that after the necessary .subtring() 's this one:

func('data1','data2'...)

What would be the best Regex pattern to retrieve func() 's parameters avoiding counting on delimiter characters (' and ,) as they could sometimes be part of the parameter's string?

You should not use regex to parse programming language code, because it's not a regular language. This article explains why: Can regular expressions be used to match nested patterns?


And to prove my point, allow me to share an actual solution with a regex that I think will match what you want:

^                               # Start of string
[^()'""]+\(                     # matches `func(`
                                #
(?>                             # START - Iterator (match each parameter)
 (?(param)\s*,(?>\s*))          # if it's not the 1st parameter, start with a `,`
 (?'param'                      # opens 'param' (main group, captures each parameter)
                                #
   (?>                          # Group: matches every char in parameter
      (?'qt'['""])              #  ALTERNATIVE 1: strings (matches ""foo"",'ba\'r','g)o\'o')
      (?:                       #   match anything inside quotes
        [^\\'""]+               #    any char except quotes or escapes
        |(?!\k'qt')['""]        #    or the quotes not used here (ie ""double'quotes"")
        |\\.                    #    or any escaped char
      )*                        #   repeat: *
      \k'qt'                    #   close quotes
   |  (?'parens'\()             #  ALTERNATIVE 2: `(` open nested parens (nested func)
   |  (?'-parens'\))            #  ALTERNATIVE 3: `)` close nested parens
   |  (?'braces'\{)             #  ALTERNATIVE 4: `{` open braces
   |  (?'-braces'})             #  ALTERNATIVE 5: `}` close braces
   |  [^,(){}\\'""]             #  ALTERNATIVE 6: anything else (var, funcName, operator, etc)
   |  (?(parens),)              #  ALTERNATIVE 7: `,` a comma if inside parens
   |  (?(braces),)              #  ALTERNATIVE 8: `,` a comma if inside braces
   )*                           # Repeat: *
                                # CONDITIONS:
  (?(parens)(?!))               #  a. balanced parens
  (?(braces)(?!))               #  b. balanced braces
  (?<!\s)                       #  c. no trailing spaces
                                #
 )                              # closes 'param'
)*                              # Repeat the whole thing once for every parameter
                                #
\s*\)\s*(?:;\s*)?               # matches `)` at the end if func(), maybe with a `;`
$                               # END

One-liner:

^[^()'""]+\((?>(?(param)\s*,(?>\s*))(?'param'(?>(?'qt'['""])(?:[^\\'""]+|(?!\k'qt')['""]|\\.)*\k'qt'|(?'parens'\()|(?'-parens'\))|(?'braces'\{)|(?'-braces'})|[^,(){}\\'""]|(?(parens),)|(?(braces),))*(?(parens)(?!))(?(braces)(?!))(?<!\s)))*\s*\)\s*(?:;\s*)?$

Test online

As you can imagine by now (if you're still reading), even with an indented pattern and with comments for every construct, this regex is unreadable, quite difficult to mantain and almost impossible to debug... And I can guess there will be exceptions that would make it fail.

Just in case a stubborn mind is still interested, here's a link to the logic behind it: Matching Nested Constructs with Balancing Groups (regular-expressions.info)

Try this

            string input = "<a href=\"javascript:func('data1','data2'...)\">...</a>";

            string pattern1 = @"\w+\((?'parameters'[^\)]+)\)";

            Regex expr1 = new Regex(pattern1);
            Match match1 = expr1.Match(input);
            string parameters = match1.Groups["parameters"].Value;

            string pattern2 = @"\w+";
            Regex expr2 = new Regex(pattern2);
            MatchCollection matches = expr2.Matches(parameters);

            List<string> results = new List<string>();
            foreach (Match match in matches)
            {
                results.Add(match.Value);
            }​

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM