[英]Best c# Regex pattern to get a function's parameters values in a raw string?
我在C#
項目中解析html
代碼。
假設我們有以下字符串:
<a href="javascript:func('data1','data2'...)">...</a>
或在必要的.subtring()
之后的.subtring()
:
func('data1','data2'...)
什么是最好的正則Regex
模式,以檢索func()
的參數而避免依靠分隔符('和'),因為它們有時可能是參數字符串的一部分?
您不應該使用正則表達式來解析編程語言代碼,因為它不是常規語言。 本文解釋了原因: 可以使用正則表達式來匹配嵌套模式嗎?
為了證明我的觀點,請允許我與正則表達式共享一個實際的解決方案,我認為它會滿足您的要求:
^ # Start of string
[^()'""]+\( # matches `func(`
#
(?> # START - Iterator (match each parameter)
(?(param)\s*,(?>\s*)) # if it's not the 1st parameter, start with a `,`
(?'param' # opens 'param' (main group, captures each parameter)
#
(?> # Group: matches every char in parameter
(?'qt'['""]) # ALTERNATIVE 1: strings (matches ""foo"",'ba\'r','g)o\'o')
(?: # match anything inside quotes
[^\\'""]+ # any char except quotes or escapes
|(?!\k'qt')['""] # or the quotes not used here (ie ""double'quotes"")
|\\. # or any escaped char
)* # repeat: *
\k'qt' # close quotes
| (?'parens'\() # ALTERNATIVE 2: `(` open nested parens (nested func)
| (?'-parens'\)) # ALTERNATIVE 3: `)` close nested parens
| (?'braces'\{) # ALTERNATIVE 4: `{` open braces
| (?'-braces'}) # ALTERNATIVE 5: `}` close braces
| [^,(){}\\'""] # ALTERNATIVE 6: anything else (var, funcName, operator, etc)
| (?(parens),) # ALTERNATIVE 7: `,` a comma if inside parens
| (?(braces),) # ALTERNATIVE 8: `,` a comma if inside braces
)* # Repeat: *
# CONDITIONS:
(?(parens)(?!)) # a. balanced parens
(?(braces)(?!)) # b. balanced braces
(?<!\s) # c. no trailing spaces
#
) # closes 'param'
)* # Repeat the whole thing once for every parameter
#
\s*\)\s*(?:;\s*)? # matches `)` at the end if func(), maybe with a `;`
$ # END
一內膽:
^[^()'""]+\((?>(?(param)\s*,(?>\s*))(?'param'(?>(?'qt'['""])(?:[^\\'""]+|(?!\k'qt')['""]|\\.)*\k'qt'|(?'parens'\()|(?'-parens'\))|(?'braces'\{)|(?'-braces'})|[^,(){}\\'""]|(?(parens),)|(?(braces),))*(?(parens)(?!))(?(braces)(?!))(?<!\s)))*\s*\)\s*(?:;\s*)?$
如您現在所能想象的(如果您仍在閱讀中),即使使用縮進的模式並為每個結構添加注釋,此正則表達式也不可讀,難以維護且幾乎無法調試...而且我猜想在那里是會使它失敗的例外。
以防萬一頑固的頭腦仍然感興趣,這是其背后邏輯的鏈接:將嵌套構造與平衡組匹配(regular-expressions.info)
嘗試這個
string input = "<a href=\"javascript:func('data1','data2'...)\">...</a>";
string pattern1 = @"\w+\((?'parameters'[^\)]+)\)";
Regex expr1 = new Regex(pattern1);
Match match1 = expr1.Match(input);
string parameters = match1.Groups["parameters"].Value;
string pattern2 = @"\w+";
Regex expr2 = new Regex(pattern2);
MatchCollection matches = expr2.Matches(parameters);
List<string> results = new List<string>();
foreach (Match match in matches)
{
results.Add(match.Value);
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.