[英]Regex match for tags within tags and last matching tag
I am trying to parse some xml tags with data containing Escaped strings Some samples 我正在尝试使用包含转义字符串的数据来解析一些xml标签。
other tags with our without newlines
<tag name="abc1" type="bcd" value="test"><tag name="abc2" type="bcd" value="test">
other tags other tags with our without newlines
<tag name="abc2" type="bcd" value="<w:test xmlns:wst="http://schemas.xmlsoap.org/ws/2005/02/trust"><a xmlns:"a:b:c:ddd:">XEduAjr8MoV</a></w:test>">
basically I need to find values in tags within other strings. 基本上我需要在其他字符串中的标记中查找值。 Something like this
像这样
<tag name="wwww" type="wwww" value="SOME HTML ESCAPED STRING WITH NEWLINES">
Here is what I have 这是我所拥有的
<tag name="(?<name>\w*)" type="(?<id>\w*)" value="(?<value>.*)">
I am using this c# code 我正在使用此C#代码
var regex = new Regex(regstr, RegexOptions.Multiline);
MatchCollection mc = regex.Matches(sourcestring);
I am running into problems with multiple matches combined because of (?<value>.*)
for if both are same line <tag name="abc1" type="bcd" value="test"><tag name="abc2" type="bcd" value="test">
Any way to get around this? 我遇到合并多个匹配项的问题,这是因为
(?<value>.*)
是否都在同一行<tag name="abc1" type="bcd" value="test"><tag name="abc2" type="bcd" value="test">
可以解决这个问题吗? Is there any better way? 有什么更好的办法吗?
Its not advisable to parse xml files with regex patterns. 不建议使用正则表达式模式解析xml文件。 A reason for this is because xml involves/requires deep nesting.
这样做的原因是因为xml涉及/需要深度嵌套。
It's well known that you should not use regex to parse xhtml, unless you don't have complex tags and a weird set of characters. 众所周知,除非您没有复杂的标记和一组怪异的字符,否则不应该使用正则表达式来解析xhtml。
However, if you want to use regex, for your specific example, you have to use non greedy (or lazy) quantifiers: 但是,如果要使用正则表达式,则对于您的特定示例,必须使用非贪婪 (或惰性)量词:
<tag name="(?<name>\w*?)" type="(?<id>\w*?)" value="(?<value>.*?)">
HERE ---^
also I put it here ---^------------------^
since it is more secure, but it is not needed
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.