使用Java删除XML中的空标签

Question

I'm giving some functionality to a servlet, one of the things I want to do is, when receiving the InputStream (which is basically a PDF document parsed into an XML format) set that data to a String object, then I try to delete all the empty tags, but I haven't got any good result so far: 我为Servlet提供了一些功能，我想做的一件事就是，当收到InputStream（基本上是解析为XML格式的PDF文档）时，将该数据设置为String对象，然后尝试删除所有空标签，但到目前为止我还没有得到任何好的结果：

This is the data the servlet is receiving 这是servlet接收的数据

    <form1>
        <GenInfo>
            <Section1>
                <EmployeeDet>
                    <Title>999990000</Title>
                    <Firstname>MIKE</Firstname>
                    <Surname>SPENCER</Surname>
                    <CoName/>
                    <EmpAdd>
                        <Address><Add1/><Add2/><Town/><County/><Pcode/></Address>
                    </EmpAdd>
                    <PosHeld>DEVELOPER</PosHeld>
                    <Email/>
                    <ConNo/>
                    <Nationality/>
                    <PPSNo/>
                    <EmpNo/>
                </EmployeeDet>
            </Section1>
        </GenInfo>
    </form1>

The final result should be looking like this: 最终结果应如下所示：

    <form1>
        <GenInfo>
            <Section1>
                <EmployeeDet>
                    <Title>999990000</Title>
                    <Firstname>MIKE</Firstname>
                    <Surname>SPENCER</Surname>
                    <PosHeld>DEVELOPER</PosHeld>
                </EmployeeDet>
            </Section1>
        </GenInfo>
    </form1>

My apologies if it is a repeated question but I did some research over similar posts and none of them could provide me the correct approach, that's why I am asking you in a separate post. 我很抱歉，如果这是一个重复的问题，但是我对类似的帖子进行了一些研究，但没有一个可以为我提供正确的方法，这就是为什么我要在另一篇帖子中问您。

Thank you in advance. 先感谢您。

Answer 1

Here's regex way of doing what you're wanting. 这是您想要的regex方式。 I'm sure there are probably some "edge" cases that I'm not thinking of, but sometimes you can't tell when you use regex . 我敢肯定我可能没有想到一些“边缘”情况，但是有时您无法确定何时使用regex 。 Also, a DOM parser is probably the best way to do this. 另外，DOM解析器可能是执行此操作的最佳方法。

public static void main(String[] args) throws Exception {
    String[] patterns = new String[] {
        // This will remove empty elements that look like <ElementName/>
        "\\s*<\\w+/>", 
        // This will remove empty elements that look like <ElementName></ElementName>
        "\\s*<\\w+></\\w+>", 
        // This will remove empty elements that look like 
        // <ElementName>
        // </ElementName>
        "\\s*<\\w+>\n*\\s*</\\w+>"
    };

    String xml = "    <form1>\n" +
                    "        <GenInfo>\n" +
                    "            <Section1>\n" +
                    "                <EmployeeDet>\n" +
                    "                    <Title>999990000</Title>\n" +
                    "                    <Firstname>MIKE</Firstname>\n" +
                    "                    <Surname>SPENCER</Surname>\n" +
                    "                    <CoName/>\n" +
                    "                    <EmpAdd>\n" +
                    "                        <Address><Add1/><Add2/><Town/><County/><Pcode/></Address>\n" +
                    "                    </EmpAdd>\n" +
                    "                    <PosHeld>DEVELOPER</PosHeld>\n" +
                    "                    <Email/>\n" +
                    "                    <ConNo/>\n" +
                    "                    <Nationality/>\n" +
                    "                    <PPSNo/>\n" +
                    "                    <EmpNo/>\n" +
                    "                </EmployeeDet>\n" +
                    "            </Section1>\n" +
                    "        </GenInfo>\n" +
                    "    </form1>";

    for (String pattern : patterns) {
        Matcher matcher = Pattern.compile(pattern).matcher(xml);
        xml = matcher.replaceAll("");
    }

    System.out.println(xml);
}

Results: 结果：

    <form1>
        <GenInfo>
            <Section1>
                <EmployeeDet>
                    <Title>999990000</Title>
                    <Firstname>MIKE</Firstname>
                    <Surname>SPENCER</Surname>
                    <PosHeld>DEVELOPER</PosHeld>
                </EmployeeDet>
            </Section1>
        </GenInfo>
    </form1>

Answer 2

What you have to do is iterate recursively over all the nodes. 您要做的是在所有节点上进行递归迭代。 And once you've found a leaf, it's it's empty just remove it. 找到叶子后，它就空了，只需将其删除即可。

There is a very good example using DOM parser here 有一个很好的例子使用DOM解析器这里

使用Java删除XML中的空标签

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-06-01 17:09:48

解决方案2
0 2015-06-01 16:00:37

使用Java删除XML中的空标签

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-06-01 17:09:48

解决方案2 0 2015-06-01 16:00:37

解决方案1
4 已采纳 2015-06-01 17:09:48

解决方案2
0 2015-06-01 16:00:37