简体   繁体   English

使用Java删除XML中的空标签

[英]Remove empty tags at XML using Java

I'm giving some functionality to a servlet, one of the things I want to do is, when receiving the InputStream (which is basically a PDF document parsed into an XML format) set that data to a String object, then I try to delete all the empty tags, but I haven't got any good result so far: 我为Servlet提供了一些功能,我想做的一件事就是,当收到InputStream(基本上是解析为XML格式的PDF文档)时,将该数据设置为String对象,然后尝试删除所有空标签,但到目前为止我还没有得到任何好的结果:

This is the data the servlet is receiving 这是servlet接收的数据

    <form1>
        <GenInfo>
            <Section1>
                <EmployeeDet>
                    <Title>999990000</Title>
                    <Firstname>MIKE</Firstname>
                    <Surname>SPENCER</Surname>
                    <CoName/>
                    <EmpAdd>
                        <Address><Add1/><Add2/><Town/><County/><Pcode/></Address>
                    </EmpAdd>
                    <PosHeld>DEVELOPER</PosHeld>
                    <Email/>
                    <ConNo/>
                    <Nationality/>
                    <PPSNo/>
                    <EmpNo/>
                </EmployeeDet>
            </Section1>
        </GenInfo>
    </form1>

The final result should be looking like this: 最终结果应如下所示:

    <form1>
        <GenInfo>
            <Section1>
                <EmployeeDet>
                    <Title>999990000</Title>
                    <Firstname>MIKE</Firstname>
                    <Surname>SPENCER</Surname>
                    <PosHeld>DEVELOPER</PosHeld>
                </EmployeeDet>
            </Section1>
        </GenInfo>
    </form1>

My apologies if it is a repeated question but I did some research over similar posts and none of them could provide me the correct approach, that's why I am asking you in a separate post. 我很抱歉,如果这是一个重复的问题,但是我对类似的帖子进行了一些研究,但没有一个可以为我提供正确的方法,这就是为什么我要在另一篇帖子中问您。

Thank you in advance. 先感谢您。

Here's regex way of doing what you're wanting. 这是您想要的regex方式。 I'm sure there are probably some "edge" cases that I'm not thinking of, but sometimes you can't tell when you use regex . 我敢肯定我可能没有想到一些“边缘”情况,但是有时您无法确定何时使用regex Also, a DOM parser is probably the best way to do this. 另外,DOM解析器可能是执行此操作的最佳方法。

public static void main(String[] args) throws Exception {
    String[] patterns = new String[] {
        // This will remove empty elements that look like <ElementName/>
        "\\s*<\\w+/>", 
        // This will remove empty elements that look like <ElementName></ElementName>
        "\\s*<\\w+></\\w+>", 
        // This will remove empty elements that look like 
        // <ElementName>
        // </ElementName>
        "\\s*<\\w+>\n*\\s*</\\w+>"
    };

    String xml = "    <form1>\n" +
                    "        <GenInfo>\n" +
                    "            <Section1>\n" +
                    "                <EmployeeDet>\n" +
                    "                    <Title>999990000</Title>\n" +
                    "                    <Firstname>MIKE</Firstname>\n" +
                    "                    <Surname>SPENCER</Surname>\n" +
                    "                    <CoName/>\n" +
                    "                    <EmpAdd>\n" +
                    "                        <Address><Add1/><Add2/><Town/><County/><Pcode/></Address>\n" +
                    "                    </EmpAdd>\n" +
                    "                    <PosHeld>DEVELOPER</PosHeld>\n" +
                    "                    <Email/>\n" +
                    "                    <ConNo/>\n" +
                    "                    <Nationality/>\n" +
                    "                    <PPSNo/>\n" +
                    "                    <EmpNo/>\n" +
                    "                </EmployeeDet>\n" +
                    "            </Section1>\n" +
                    "        </GenInfo>\n" +
                    "    </form1>";

    for (String pattern : patterns) {
        Matcher matcher = Pattern.compile(pattern).matcher(xml);
        xml = matcher.replaceAll("");
    }

    System.out.println(xml);
}

Results: 结果:

    <form1>
        <GenInfo>
            <Section1>
                <EmployeeDet>
                    <Title>999990000</Title>
                    <Firstname>MIKE</Firstname>
                    <Surname>SPENCER</Surname>
                    <PosHeld>DEVELOPER</PosHeld>
                </EmployeeDet>
            </Section1>
        </GenInfo>
    </form1>

What you have to do is iterate recursively over all the nodes. 您要做的是在所有节点上进行递归迭代。 And once you've found a leaf, it's it's empty just remove it. 找到叶子后,它就空了,只需将其删除即可。

There is a very good example using DOM parser here 有一个很好的例子使用DOM解析器这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM