[英]Problems with named capturing in c# regex
I've been struggling with this for a while 我一直在努力解决这个问题
var matches = Regex.Matches("<h2>hello world</h2>",
@"<(?<tag>[^\s/>]+)(?<innerHtml>.*)(?<closeTag>[^\s>]+)>",
RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
string tag = matches[0].Groups["tag"].Value; // "h2"
string innerHtml = matches[0].Groups["innerHtml"].Value; // ">hello world</h"
string closeTag = matches[0].Groups["closeTag"].Value; // "2"
As can be seen tag
works as expected while the innerHtml
and closeTag
does not. 可以看出tag
按预期工作,而innerHtml
和closeTag
则没有。 Any advice? 有什么建议? Thanks. 谢谢。
Update 更新
The input string may vary, this is another scenario "<div class='myclass'><h2>hello world</h2></div>"
输入字符串可能会有所不同,这是另一种情况"<div class='myclass'><h2>hello world</h2></div>"
Try matching the >
and </
outside of the capture groups, like this: 尝试匹配捕获组的>
和</
外部,如下所示:
var matches = Regex.Matches("<h2>hello world</h2>",
@"<(?<tag>[^\s/>]+)>(?<innerHtml>.*)</(?<closeTag>[^\s>]+)>",
RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
Update More specific example that should be a little more flexible: 更新更具体的示例,应该更灵活一些:
var matches = Regex.Matches(
"<div class='myclass'><h2>hello world</h2></div>",
@"<(?<tag>[^\s>]+) #Opening tag
\s*(?<attributes>[^>]*)\s*> #Attributes inside tag (optional)
(?<innerHtml>.*) #Inner Html
</(?<closeTag>\1)> #Closing tag, must match opening tag",
RegexOptions.IgnoreCase |
RegexOptions.Compiled |
RegexOptions.Multiline |
RegexOptions.IgnorePatternWhitespace);
string tag = matches[0].Groups["tag"].Value; // "div"
string attr = matches[0].Groups["attributes"].Value; // "class='myclass'"
string innerHtml = matches[0].Groups["innerHtml"].Value; // "<h2>hello world</h2>"
string closeTag = matches[0].Groups["closeTag"].Value; // "div"
You want the Singleline
option, not Multiline
. 您需要Singleline
选项,而不是Multiline
。 Singleline
enables .
Singleline
启用.
to match linefeeds, while Multiline
changes the behavior of the anchors ( ^
and $
), which you aren't using. 匹配换行符,而Multiline
更改您没有使用的锚点( ^
和$
)的行为。
Also, if you want the closing tag to have the same name as the opening tag, you should use a backreference. 此外,如果您希望结束标记与开始标记具有相同的名称,则应使用反向引用。 Here I've used ''
as the name delimiters instead of <>
to reduce confusion: 在这里,我使用''
作为名称分隔符而不是<>
来减少混淆:
var matches = Regex.Matches("<h2>hello world</h2>",
@"<(?'tag'[^/>]+)(?'innerHtml'.*)</\k'tag'>",
RegexOptions.IgnoreCase | RegexOptions.Singleline);
And you don't need the Compiled
option. 而且您不需要Compiled
选项。 All it does is make it more expensive to create the Regex object, for an increase in performance that you almost certainly don't need and won't notice. 它所做的只是让创建Regex对象变得更加昂贵,因为你几乎肯定不需要也不会注意到性能的提升。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.