简体   繁体   English

使用Java漂亮打印XML

[英]Pretty-printing XML using Java

There are a dozen threads regarding that topic, but all of them contain answers that do not work for me in a satisfactory manner. 关于该主题有十多个主题,但是所有主题都包含无法令人满意地回答的答案。 It seems one needs to use a specific DOM implementation. 似乎需要使用一种特定的DOM实现。 However, I cannot get it to read the xml input: 但是,我无法读取XML输入:

@Test
public void testPrettyPrintConvertDomLevel3() throws UnsupportedEncodingException {
    String unformattedXml
            = "<?xml version=\"1.0\" encoding=\"UTF-16\"?><QueryMessage\n"
            + "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n"
            + "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
            + "    <Query>\n"
            + "        <query:CategorySchemeWhere>\n"
            + "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n"
            + "        </query:CategorySchemeWhere>\n"
            + "    </Query>\n\n\n\n\n"
            + "</QueryMessage>";

    System.out.println(prettyPrintWithXercesDomLevel3(unformattedXml.getBytes("UTF-16")));
}

Here is the method: 方法如下:

public static String prettyPrintWithXercesDomLevel3(byte[] input) {
    try {
//System.setProperty(DOMImplementationRegistry.PROPERTY,"org.apache.xerces.dom.DOMImplementationSourceImpl");
        DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("XML 3.0 LS 3.0");
        if (impl == null) {
            throw new RuntimeException("No DOMImplementation found !");
        }

        log.info(String.format("DOMImplementationLS: %s", impl.getClass().getName()));

        LSParser parser = impl.createLSParser(
                DOMImplementationLS.MODE_SYNCHRONOUS,
                //"http://www.w3.org/2001/XMLSchema");
                "http://www.w3.org/TR/REC-xml");
        log.info(String.format("LSParser: %s", parser.getClass().getName()));
        LSInput lsi = impl.createLSInput();
        lsi.setByteStream(new ByteArrayInputStream(input));
        Document doc = parser.parse(lsi);

        LSSerializer serializer = impl.createLSSerializer();
        serializer.getDomConfig().setParameter("format-pretty-print",Boolean.TRUE);
        LSOutput output = impl.createLSOutput();
        output.setEncoding("UTF-8");
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        output.setByteStream(baos);
        serializer.write(doc, output);
        return baos.toString();
//            return serializer.writeToString(doc);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

However, the pretty-printing does not work. 但是,漂亮的打印不起作用。 Any ideas? 有任何想法吗?

 import java.io.StringReader; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Node; import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.ls.DOMImplementationLS; import org.w3c.dom.ls.LSSerializer; import org.xml.sax.InputSource; /** * * @author lananda */ public class PrettyXmlWriter { public static void main(String... args){ String unformattedXml = "<?xml version=\\"1.0\\" encoding=\\"UTF-16\\"?>" + "<QueryMessage\\n" + " xmlns=\\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\\"\\n" + " xmlns:query=\\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\\">\\n" + " <Query>\\n" + " <query:CategorySchemeWhere>\\n" + " \\t\\t\\t\\t\\t <query:AgencyID>ECB\\n\\n\\n\\n</query:AgencyID>\\n" + " </query:CategorySchemeWhere>\\n" + " </Query>\\n\\n\\n\\n\\n" + "</QueryMessage>"; unformattedXml = unformattedXml.replaceAll("\\\\s+", " "); String format = format(unformattedXml); System.out.println(format); } public static String format(String xml) { try { final InputSource src = new InputSource(new StringReader(xml)); final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement(); final Boolean keepDeclaration = Boolean.valueOf(xml.startsWith("<?xml")); //May need this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl"); final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS"); final LSSerializer writer = impl.createLSSerializer(); writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE); // Set this to true if the output needs to be beautified. writer.getDomConfig().setParameter("xml-declaration", keepDeclaration); // Set this to true if the declaration is needed to be outputted. return writer.writeToString(document); } catch (Exception e) { throw new RuntimeException(e); } } } 

The encoding of your Java source file must also match what you are trying to run with. Java源文件的编码也必须与您尝试使用的编码匹配。 If you are using Eclipse the default encoding is CP-1252 for some reason. 如果您使用的是Eclipse,则出于某种原因,默认编码为CP-1252。 The first thing I do when I put in a new version of Eclipse is change the file encoding to UTF-8. 当我放入新版本的Eclipse时,我要做的第一件事就是将文件编码更改为UTF-8。

I used your code and it worked fine since my source file encoding was UTF-8. 我使用了您的代码,由于我的源文件编码为UTF-8,因此效果很好。

Update: it seems that all whitespace is significant in XML : "Based on the W3C XML specification, the Oracle XML Developer's Kit (XDK) XML parsers, by default, preserves all whitespace.". 更新:似乎所有空白在XML中都是重要的 :“基于W3C XML规范,默认情况下,Oracle XML开发人员工具包(XDK)XML解析器保留所有空白。”。 Therefore it is quite reasonable NOT to make that feature part of a public API. 因此, 将该功能作为公共API的一部分是很合理的。 org.jdom2 provides a reasonable implementation: org.jdom2提供了一个合理的实现:

@Test
public void testPrettyPrintConvertDomLevel3() throws UnsupportedEncodingException, JDOMException, IOException {
    String unformattedXml
            = "<?xml version=\"1.0\" encoding=\"UTF-16\"?><QueryMessage\n"
            + "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n"
            + "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
            + "    <Query>\n"
            + "        <query:CategorySchemeWhere>\n"
            + "   \t\t\t\t\t         <query:AgencyID>ECB \n </query:AgencyID>\n"
            + "        </query:CategorySchemeWhere>\n"
            + "    </Query>\n\n\n\n\n"
            + "</QueryMessage>";
    SAXBuilder builder = new SAXBuilder();
    Document doc = builder.build(new ByteArrayInputStream(unformattedXml.getBytes("UTF-16")));
    Format f = Format.getPrettyFormat();
    f.setLineSeparator(LineSeparator.NL);
    f.setTextMode(Format.TextMode.TRIM_FULL_WHITE);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    new XMLOutputter(f).output(doc, baos);
    assertEquals("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
            + "<QueryMessage xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\" xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
            + "  <Query>\n"
            + "    <query:CategorySchemeWhere>\n"
            + "      <query:AgencyID>ECB \n"
            + " </query:AgencyID>\n"
            + "    </query:CategorySchemeWhere>\n"
            + "  </Query>\n"
            + "</QueryMessage>\n", baos.toString());
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM