[英]Pretty-printing XML using Java
There are a dozen threads regarding that topic, but all of them contain answers that do not work for me in a satisfactory manner. 关于该主题有十多个主题,但是所有主题都包含无法令人满意地回答的答案。 It seems one needs to use a specific DOM implementation. 似乎需要使用一种特定的DOM实现。 However, I cannot get it to read the xml input: 但是,我无法读取XML输入:
@Test
public void testPrettyPrintConvertDomLevel3() throws UnsupportedEncodingException {
String unformattedXml
= "<?xml version=\"1.0\" encoding=\"UTF-16\"?><QueryMessage\n"
+ " xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n"
+ " xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
+ " <Query>\n"
+ " <query:CategorySchemeWhere>\n"
+ " \t\t\t\t\t <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n"
+ " </query:CategorySchemeWhere>\n"
+ " </Query>\n\n\n\n\n"
+ "</QueryMessage>";
System.out.println(prettyPrintWithXercesDomLevel3(unformattedXml.getBytes("UTF-16")));
}
Here is the method: 方法如下:
public static String prettyPrintWithXercesDomLevel3(byte[] input) {
try {
//System.setProperty(DOMImplementationRegistry.PROPERTY,"org.apache.xerces.dom.DOMImplementationSourceImpl");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("XML 3.0 LS 3.0");
if (impl == null) {
throw new RuntimeException("No DOMImplementation found !");
}
log.info(String.format("DOMImplementationLS: %s", impl.getClass().getName()));
LSParser parser = impl.createLSParser(
DOMImplementationLS.MODE_SYNCHRONOUS,
//"http://www.w3.org/2001/XMLSchema");
"http://www.w3.org/TR/REC-xml");
log.info(String.format("LSParser: %s", parser.getClass().getName()));
LSInput lsi = impl.createLSInput();
lsi.setByteStream(new ByteArrayInputStream(input));
Document doc = parser.parse(lsi);
LSSerializer serializer = impl.createLSSerializer();
serializer.getDomConfig().setParameter("format-pretty-print",Boolean.TRUE);
LSOutput output = impl.createLSOutput();
output.setEncoding("UTF-8");
ByteArrayOutputStream baos = new ByteArrayOutputStream();
output.setByteStream(baos);
serializer.write(doc, output);
return baos.toString();
// return serializer.writeToString(doc);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
However, the pretty-printing does not work. 但是,漂亮的打印不起作用。 Any ideas? 有任何想法吗?
import java.io.StringReader; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Node; import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.ls.DOMImplementationLS; import org.w3c.dom.ls.LSSerializer; import org.xml.sax.InputSource; /** * * @author lananda */ public class PrettyXmlWriter { public static void main(String... args){ String unformattedXml = "<?xml version=\\"1.0\\" encoding=\\"UTF-16\\"?>" + "<QueryMessage\\n" + " xmlns=\\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\\"\\n" + " xmlns:query=\\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\\">\\n" + " <Query>\\n" + " <query:CategorySchemeWhere>\\n" + " \\t\\t\\t\\t\\t <query:AgencyID>ECB\\n\\n\\n\\n</query:AgencyID>\\n" + " </query:CategorySchemeWhere>\\n" + " </Query>\\n\\n\\n\\n\\n" + "</QueryMessage>"; unformattedXml = unformattedXml.replaceAll("\\\\s+", " "); String format = format(unformattedXml); System.out.println(format); } public static String format(String xml) { try { final InputSource src = new InputSource(new StringReader(xml)); final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement(); final Boolean keepDeclaration = Boolean.valueOf(xml.startsWith("<?xml")); //May need this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl"); final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS"); final LSSerializer writer = impl.createLSSerializer(); writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE); // Set this to true if the output needs to be beautified. writer.getDomConfig().setParameter("xml-declaration", keepDeclaration); // Set this to true if the declaration is needed to be outputted. return writer.writeToString(document); } catch (Exception e) { throw new RuntimeException(e); } } }
The encoding of your Java source file must also match what you are trying to run with. Java源文件的编码也必须与您尝试使用的编码匹配。 If you are using Eclipse the default encoding is CP-1252 for some reason. 如果您使用的是Eclipse,则出于某种原因,默认编码为CP-1252。 The first thing I do when I put in a new version of Eclipse is change the file encoding to UTF-8. 当我放入新版本的Eclipse时,我要做的第一件事就是将文件编码更改为UTF-8。
I used your code and it worked fine since my source file encoding was UTF-8. 我使用了您的代码,由于我的源文件编码为UTF-8,因此效果很好。
Update: it seems that all whitespace is significant in XML : "Based on the W3C XML specification, the Oracle XML Developer's Kit (XDK) XML parsers, by default, preserves all whitespace.". 更新:似乎所有空白在XML中都是重要的 :“基于W3C XML规范,默认情况下,Oracle XML开发人员工具包(XDK)XML解析器保留所有空白。”。 Therefore it is quite reasonable NOT to make that feature part of a public API. 因此, 不将该功能作为公共API的一部分是很合理的。 org.jdom2 provides a reasonable implementation: org.jdom2提供了一个合理的实现:
@Test
public void testPrettyPrintConvertDomLevel3() throws UnsupportedEncodingException, JDOMException, IOException {
String unformattedXml
= "<?xml version=\"1.0\" encoding=\"UTF-16\"?><QueryMessage\n"
+ " xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n"
+ " xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
+ " <Query>\n"
+ " <query:CategorySchemeWhere>\n"
+ " \t\t\t\t\t <query:AgencyID>ECB \n </query:AgencyID>\n"
+ " </query:CategorySchemeWhere>\n"
+ " </Query>\n\n\n\n\n"
+ "</QueryMessage>";
SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(new ByteArrayInputStream(unformattedXml.getBytes("UTF-16")));
Format f = Format.getPrettyFormat();
f.setLineSeparator(LineSeparator.NL);
f.setTextMode(Format.TextMode.TRIM_FULL_WHITE);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
new XMLOutputter(f).output(doc, baos);
assertEquals("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+ "<QueryMessage xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\" xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
+ " <Query>\n"
+ " <query:CategorySchemeWhere>\n"
+ " <query:AgencyID>ECB \n"
+ " </query:AgencyID>\n"
+ " </query:CategorySchemeWhere>\n"
+ " </Query>\n"
+ "</QueryMessage>\n", baos.toString());
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.