Java Regex xml解析

Question

I'm trying to find a tag from begin to end in xml and replace it with a blank. 我试图在xml中从头到尾找到一个标签，并用空格替换它。 A sample xml is like this 一个示例xml是这样的

<lins>
  <lin index="1"> ...<feature>Something</feature>... </lin>
  <lin index="2">...<feature>Something</feature>... </lin>
  <lin index="3">...<feature>Something</feature>....</lin>

  <lin index="1">...<feature>Icom</feature>... </lin>
  <lin index="2">...<feature>Icom</feature>... </lin>
<lins>

I need to remove <lin> to </lin> when ever I see Icom in between 当我看到Icom介于两者之间时，我需要删除<lin>到</lin>

<lin\\s(.+?Icom.+?)+</lin> is removing all lin items since it matches the first begin <lin> tag and the last lin end tag. <lin\\s(.+?Icom.+?)+</lin>删除所有lin项，因为它匹配第一个begin <lin>标记和最后一个lin结束标记。 Greatly appreciated if you can suggest a way to do this. 非常感谢，如果你能提出一个方法来做到这一点。 Also I can not use xml parsers in my situation. 我也不能在我的情况下使用xml解析器。

Answer 1

String result = subject.replaceAll("(?s)<lin\\b(?:(?!</lin).)*Icom(?:(?!</lin).)*</lin>", "");

should do this, unless you have <lin> tags nested into each other (or inside comments/strings). 应该这样做，除非你有<lin>标签互相嵌套（或在注释/字符串内）。

Explanation: 说明：

<lin\b              # Match <lin (but not link or linen)
(?:                 # Match...
 (?!</lin)          # as long as we're not at a closing tag
 .                  # any character
)*                  # any number of times.
Icom                # Match Icom
(?:(?!</lin).)*     # (as above:) Match any character except closing tag
</lin>              # Match closing tag

Answer 2

you cant do it with regexp. 你不能用正则表达式做到这一点。
For this example: 对于这个例子：

<tag>
    <tag> something </tag>
</tag>

<tag>
</tag>

If you use "<tag>(.*)</tag>" regexp, your group will be this: 如果您使用"<tag>(.*)</tag>"表达式，您的论坛将是：

    <tag> something </tag>
</tag>

<tag>

and if you use "<tag>(.*?)</tag>" regexp, your group will be this: 如果您使用"<tag>(.*?)</tag>"表达式，您的论坛将是：

    <tag> something

You should use something like stack to get the ending of started tag. 你应该使用类似堆栈的东西来获得开始标记的结尾。

Answer 3

I think you need to add more groups to the regexp. 我认为你需要在正则表达式中添加更多组。

Add a group for the precondition to start checking for ex ( 添加一个组作为前提条件以开始检查ex（

Then a group for the stuff inbetween, a group for Icom etc. 然后是一组用于中间的东西，一组用于Icom等。

So off the top of my head my RegEx would look like: 因此，我的RegEx看起来像是：

(<lin\ index\=)(\w+Icom\w+)(\<\/lin>)

Note the escaping might be slightly off, but in essence you need more groups and some less eager matchers. 请注意，转义可能稍微偏离，但实质上您需要更多的组和一些不那么热切的匹配器。

Java Regex xml解析

问题描述

3 个解决方案

解决方案1
4 2011-12-21 14:27:47

解决方案2
0 2011-12-21 14:28:47

解决方案3
0 2011-12-21 15:17:23

Java Regex xml解析

问题描述

3 个解决方案

解决方案1 4 2011-12-21 14:27:47

解决方案2 0 2011-12-21 14:28:47

解决方案3 0 2011-12-21 15:17:23

解决方案1
4 2011-12-21 14:27:47

解决方案2
0 2011-12-21 14:28:47

解决方案3
0 2011-12-21 15:17:23