简体   繁体   English

Java:SAXParser字符参考解码

[英]Java: SAXParser character reference decoding

With reference to this question Java: splitting up a large XML file with SAXParser I'm essentially reading in an xml file using SAXParser and echoing it to another file. 参考此问题Java:使用SAXParser拆分大型XML文件我实际上是在使用SAXParser读取xml文件并将其回显到另一个文件。

My problem is that the content of my input file contains character references which are being decoded on reading in. How can I stop this? 我的问题是输入文件的内容包含字符引用,这些字符引用在读入时将被解码。如何停止此操作? I want to write out the raw characters with no decoding of references. 我想写出原始字符,而没有解码参考。

(I can't give an example as they are decoded in the page!) (我无法举一个例子,因为它们在页面中已解码!)

dom4j 's XMLWriter class will re-encode these characters. dom4jXMLWriter类将重新编码这些字符。 For example this code: 例如这段代码:

XMLWriter writer = new XMLWriter(System.out);
writer.startElement(null, null, "example", new AttributesImpl());
writer.write(">");
writer.endElement(null, null, "example");
writer.flush();

will produce this output: 将产生以下输出:

<example>&gt;</example>

I don't think you can do this with SAX. 我认为您无法使用SAX做到这一点。 However, you can tell the StAX parser (as opposed to SAX) to not decode character entities when parsing ( see this prior answer ). 但是,您可以告诉StAX解析器(与SAX相对)在解析时不解码字符实体( 请参见此先前的回答 )。 You should be able to echo these to the output in the same format as the parser reads them in. 您应该能够以解析器读取它们的相同格式将它们回显到输出。

StAX should perform just as well as SAX. StAX的性能应与SAX一样好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM