[英]Java Regex xml parsing
I'm trying to find a tag from begin to end in xml and replace it with a blank. 我试图在xml中从头到尾找到一个标签,并用空格替换它。 A sample xml is like this 一个示例xml是这样的
<lins>
<lin index="1"> ...<feature>Something</feature>... </lin>
<lin index="2">...<feature>Something</feature>... </lin>
<lin index="3">...<feature>Something</feature>....</lin>
<lin index="1">...<feature>Icom</feature>... </lin>
<lin index="2">...<feature>Icom</feature>... </lin>
<lins>
I need to remove <lin>
to </lin>
when ever I see Icom in between 当我看到Icom介于两者之间时,我需要删除<lin>
到</lin>
<lin\\s(.+?Icom.+?)+</lin>
is removing all lin items since it matches the first begin <lin>
tag and the last lin end tag. <lin\\s(.+?Icom.+?)+</lin>
删除所有lin项,因为它匹配第一个begin <lin>
标记和最后一个lin结束标记。 Greatly appreciated if you can suggest a way to do this. 非常感谢,如果你能提出一个方法来做到这一点。 Also I can not use xml parsers in my situation. 我也不能在我的情况下使用xml解析器。
String result = subject.replaceAll("(?s)<lin\\b(?:(?!</lin).)*Icom(?:(?!</lin).)*</lin>", "");
should do this, unless you have <lin>
tags nested into each other (or inside comments/strings). 应该这样做,除非你有<lin>
标签互相嵌套(或在注释/字符串内)。
Explanation: 说明:
<lin\b # Match <lin (but not link or linen)
(?: # Match...
(?!</lin) # as long as we're not at a closing tag
. # any character
)* # any number of times.
Icom # Match Icom
(?:(?!</lin).)* # (as above:) Match any character except closing tag
</lin> # Match closing tag
you cant do it with regexp. 你不能用正则表达式做到这一点。
For this example: 对于这个例子:
<tag>
<tag> something </tag>
</tag>
<tag>
</tag>
If you use "<tag>(.*)</tag>"
regexp, your group will be this: 如果您使用"<tag>(.*)</tag>"
表达式,您的论坛将是:
<tag> something </tag>
</tag>
<tag>
and if you use "<tag>(.*?)</tag>"
regexp, your group will be this: 如果您使用"<tag>(.*?)</tag>"
表达式,您的论坛将是:
<tag> something
You should use something like stack to get the ending of started tag. 你应该使用类似堆栈的东西来获得开始标记的结尾。
I think you need to add more groups to the regexp. 我认为你需要在正则表达式中添加更多组。
Add a group for the precondition to start checking for ex ( 添加一个组作为前提条件以开始检查ex(
Then a group for the stuff inbetween, a group for Icom etc. 然后是一组用于中间的东西,一组用于Icom等。
So off the top of my head my RegEx would look like: 因此,我的RegEx看起来像是:
(<lin\ index\=)(\w+Icom\w+)(\<\/lin>)
Note the escaping might be slightly off, but in essence you need more groups and some less eager matchers. 请注意,转义可能稍微偏离,但实质上您需要更多的组和一些不那么热切的匹配器。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.