簡體   English   中英

在原始字符串中獲取函數參數值的最佳c#正則表達式模式?

[英]Best c# Regex pattern to get a function's parameters values in a raw string?

我在C#項目中解析html代碼。

假設我們有以下字符串:

<a href="javascript:func('data1','data2'...)">...</a>

或在必要的.subtring()之后的.subtring()

func('data1','data2'...)

什么是最好的正則Regex模式,以檢索func()的參數而避免依靠分隔符('和'),因為它們有時可能是參數字符串的一部分?

不應該使用正則表達式來解析編程語言代碼,因為它不是常規語言。 本文解釋了原因: 可以使用正則表達式來匹配嵌套模式嗎?


為了證明我的觀點,請允許我與正則表達式共享一個實際的解決方案,我認為它會滿足您的要求:

^                               # Start of string
[^()'""]+\(                     # matches `func(`
                                #
(?>                             # START - Iterator (match each parameter)
 (?(param)\s*,(?>\s*))          # if it's not the 1st parameter, start with a `,`
 (?'param'                      # opens 'param' (main group, captures each parameter)
                                #
   (?>                          # Group: matches every char in parameter
      (?'qt'['""])              #  ALTERNATIVE 1: strings (matches ""foo"",'ba\'r','g)o\'o')
      (?:                       #   match anything inside quotes
        [^\\'""]+               #    any char except quotes or escapes
        |(?!\k'qt')['""]        #    or the quotes not used here (ie ""double'quotes"")
        |\\.                    #    or any escaped char
      )*                        #   repeat: *
      \k'qt'                    #   close quotes
   |  (?'parens'\()             #  ALTERNATIVE 2: `(` open nested parens (nested func)
   |  (?'-parens'\))            #  ALTERNATIVE 3: `)` close nested parens
   |  (?'braces'\{)             #  ALTERNATIVE 4: `{` open braces
   |  (?'-braces'})             #  ALTERNATIVE 5: `}` close braces
   |  [^,(){}\\'""]             #  ALTERNATIVE 6: anything else (var, funcName, operator, etc)
   |  (?(parens),)              #  ALTERNATIVE 7: `,` a comma if inside parens
   |  (?(braces),)              #  ALTERNATIVE 8: `,` a comma if inside braces
   )*                           # Repeat: *
                                # CONDITIONS:
  (?(parens)(?!))               #  a. balanced parens
  (?(braces)(?!))               #  b. balanced braces
  (?<!\s)                       #  c. no trailing spaces
                                #
 )                              # closes 'param'
)*                              # Repeat the whole thing once for every parameter
                                #
\s*\)\s*(?:;\s*)?               # matches `)` at the end if func(), maybe with a `;`
$                               # END

一內膽:

^[^()'""]+\((?>(?(param)\s*,(?>\s*))(?'param'(?>(?'qt'['""])(?:[^\\'""]+|(?!\k'qt')['""]|\\.)*\k'qt'|(?'parens'\()|(?'-parens'\))|(?'braces'\{)|(?'-braces'})|[^,(){}\\'""]|(?(parens),)|(?(braces),))*(?(parens)(?!))(?(braces)(?!))(?<!\s)))*\s*\)\s*(?:;\s*)?$

在線測試

如您現在所能想象的(如果您仍在閱讀中),即使使用縮進的模式並為每個結構添加注釋,此正則表達式也不可讀,難以維護且幾乎無法調試...而且我猜想在那里是會使它失敗的例外。

以防萬一頑固的頭腦仍然感興趣,這是其背后邏輯的鏈接:將嵌套構造與平衡組匹配(regular-expressions.info)

嘗試這個

            string input = "<a href=\"javascript:func('data1','data2'...)\">...</a>";

            string pattern1 = @"\w+\((?'parameters'[^\)]+)\)";

            Regex expr1 = new Regex(pattern1);
            Match match1 = expr1.Match(input);
            string parameters = match1.Groups["parameters"].Value;

            string pattern2 = @"\w+";
            Regex expr2 = new Regex(pattern2);
            MatchCollection matches = expr2.Matches(parameters);

            List<string> results = new List<string>();
            foreach (Match match in matches)
            {
                results.Add(match.Value);
            }​

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM