[英]Java XML Prettyprinting incorporates DTD Comments?
When parsing XML Data with the builtin Java (tested with jdk 8u151 and 8u161) XML processing engine I get strange results. 当使用内置Java(经jdk 8u151和8u161测试)的XML处理引擎解析XML数据时,我得到了奇怪的结果。 If I am using parametric entityrefs in a DTD all following SGML Comments from the DTD end up in the output document. 如果我在DTD中使用参数实体引用,则来自DTD的所有以下 SGML注释最终都会出现在输出文档中。
This is the (minimal) code I am running: 这是我正在运行的(最小)代码:
import java.io.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import org.xml.sax.InputSource;
public class FormatBug {
public static void main( String[] args ) throws Exception {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
Reader in = new FileReader( args[0] );
Writer out = new FileWriter( args[1] );
t.transform( new SAXSource( new InputSource(in) ), new StreamResult(out) );
out.flush();
out.close();
}
}
The Source document looks like this: 源文档如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc SYSTEM "doc.dtd">
<doc><p>This is a <b>bold</b> line.</p></doc>
The DTD (doc.dtd) looks like follows: DTD(doc.dtd)如下所示:
<!ELEMENT doc (p+)>
<!ENTITY % floats "b" >
<!-- comment before -->
<!ELEMENT p ( #PCDATA | %floats; )*>
<!-- comment after -->
<!ELEMENT b (#PCDATA)>
The result looks like this: 结果看起来像这样:
<!-- comment after --><!DOCTYPE doc SYSTEM "doc.dtd">
<doc><p>This is a <b>bold</b> line.</p></doc>
When replaceing the rule for p into 将p的规则替换为
<!ELEMENT p ( #PCDATA | b )*>
The spurious comment disappears. 虚假的评论消失了。
Can someone explain what is going on here? 有人可以解释这里发生了什么吗?
I also checked against JDK 9.0.4 where all comments are being copied, so I assume that I might be doing something entirely wrong. 我还对照JDK 9.0.4进行了检查,其中所有注释都被复制了,所以我认为我做的事情可能完全错误。
I can confirm this happening on JDK 1.8.0_151, and consider it a problem due to using SAXSource
as input source for transformation, because Java's javax.xml.parsers.SAXParser
ignores comments . 我可以确认这是在JDK 1.8.0_151上发生的,并认为这是一个问题,因为使用SAXSource
作为转换的输入源,因为Java的javax.xml.parsers.SAXParser
忽略了注释 。
The following variant using StAX doesn't print spurious comments on JDK 1.8 so might help in achieving to get uniform Java source running on both JDK 1.8 and 1.9: 以下使用StAX的变体不会在JDK 1.8上打印虚假注释,因此可能有助于实现在JDK 1.8和1.9上都运行统一的Java源代码:
import java.io.*;
import javax.xml.stream.*;
import javax.xml.transform.*;
import javax.xml.transform.stax.*;
import javax.xml.transform.stream.*;
public class FormatBugUsingStaX {
public static void main(String[] args) throws Exception {
InputStream inputStream = new FileInputStream(args[0]);
InputStreamReader in = new InputStreamReader(inputStream);
XMLInputFactory factory = XMLInputFactory.newInstance();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
XMLStreamReader streamReader = factory.createXMLStreamReader(in);
Writer out = new FileWriter(args[1]);
t.transform(new StAXSource(streamReader), new StreamResult(out));
}
}
Edit: If your intention is to keep comments, you might have luck by using another StAX implementation; 编辑:如果您打算保留评论,则可能通过使用其他StAX实现很幸运; cf. cf. Transforming a StAX Source in Java 用Java转换StAX源
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.