[英]Best c# Regex pattern to get a function's parameters values in a raw string?
我在C#
项目中解析html
代码。
假设我们有以下字符串:
<a href="javascript:func('data1','data2'...)">...</a>
或在必要的.subtring()
之后的.subtring()
:
func('data1','data2'...)
什么是最好的正则Regex
模式,以检索func()
的参数而避免依靠分隔符('和'),因为它们有时可能是参数字符串的一部分?
您不应该使用正则表达式来解析编程语言代码,因为它不是常规语言。 本文解释了原因: 可以使用正则表达式来匹配嵌套模式吗?
为了证明我的观点,请允许我与正则表达式共享一个实际的解决方案,我认为它会满足您的要求:
^ # Start of string
[^()'""]+\( # matches `func(`
#
(?> # START - Iterator (match each parameter)
(?(param)\s*,(?>\s*)) # if it's not the 1st parameter, start with a `,`
(?'param' # opens 'param' (main group, captures each parameter)
#
(?> # Group: matches every char in parameter
(?'qt'['""]) # ALTERNATIVE 1: strings (matches ""foo"",'ba\'r','g)o\'o')
(?: # match anything inside quotes
[^\\'""]+ # any char except quotes or escapes
|(?!\k'qt')['""] # or the quotes not used here (ie ""double'quotes"")
|\\. # or any escaped char
)* # repeat: *
\k'qt' # close quotes
| (?'parens'\() # ALTERNATIVE 2: `(` open nested parens (nested func)
| (?'-parens'\)) # ALTERNATIVE 3: `)` close nested parens
| (?'braces'\{) # ALTERNATIVE 4: `{` open braces
| (?'-braces'}) # ALTERNATIVE 5: `}` close braces
| [^,(){}\\'""] # ALTERNATIVE 6: anything else (var, funcName, operator, etc)
| (?(parens),) # ALTERNATIVE 7: `,` a comma if inside parens
| (?(braces),) # ALTERNATIVE 8: `,` a comma if inside braces
)* # Repeat: *
# CONDITIONS:
(?(parens)(?!)) # a. balanced parens
(?(braces)(?!)) # b. balanced braces
(?<!\s) # c. no trailing spaces
#
) # closes 'param'
)* # Repeat the whole thing once for every parameter
#
\s*\)\s*(?:;\s*)? # matches `)` at the end if func(), maybe with a `;`
$ # END
一内胆:
^[^()'""]+\((?>(?(param)\s*,(?>\s*))(?'param'(?>(?'qt'['""])(?:[^\\'""]+|(?!\k'qt')['""]|\\.)*\k'qt'|(?'parens'\()|(?'-parens'\))|(?'braces'\{)|(?'-braces'})|[^,(){}\\'""]|(?(parens),)|(?(braces),))*(?(parens)(?!))(?(braces)(?!))(?<!\s)))*\s*\)\s*(?:;\s*)?$
如您现在所能想象的(如果您仍在阅读中),即使使用缩进的模式并为每个结构添加注释,此正则表达式也不可读,难以维护且几乎无法调试...而且我猜想在那里是会使它失败的例外。
以防万一顽固的头脑仍然感兴趣,这是其背后逻辑的链接:将嵌套构造与平衡组匹配(regular-expressions.info)
尝试这个
string input = "<a href=\"javascript:func('data1','data2'...)\">...</a>";
string pattern1 = @"\w+\((?'parameters'[^\)]+)\)";
Regex expr1 = new Regex(pattern1);
Match match1 = expr1.Match(input);
string parameters = match1.Groups["parameters"].Value;
string pattern2 = @"\w+";
Regex expr2 = new Regex(pattern2);
MatchCollection matches = expr2.Matches(parameters);
List<string> results = new List<string>();
foreach (Match match in matches)
{
results.Add(match.Value);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.