When parsing XML Data with the builtin Java (tested with jdk 8u151 and 8u161) XML processing engine I get strange results. If I am using parametric entityrefs in a DTD all following SGML Comments from the DTD end up in the output document.
This is the (minimal) code I am running:
import java.io.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import org.xml.sax.InputSource;
public class FormatBug {
public static void main( String[] args ) throws Exception {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
Reader in = new FileReader( args[0] );
Writer out = new FileWriter( args[1] );
t.transform( new SAXSource( new InputSource(in) ), new StreamResult(out) );
out.flush();
out.close();
}
}
The Source document looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc SYSTEM "doc.dtd">
<doc><p>This is a <b>bold</b> line.</p></doc>
The DTD (doc.dtd) looks like follows:
<!ELEMENT doc (p+)>
<!ENTITY % floats "b" >
<!-- comment before -->
<!ELEMENT p ( #PCDATA | %floats; )*>
<!-- comment after -->
<!ELEMENT b (#PCDATA)>
The result looks like this:
<!-- comment after --><!DOCTYPE doc SYSTEM "doc.dtd">
<doc><p>This is a <b>bold</b> line.</p></doc>
When replaceing the rule for p into
<!ELEMENT p ( #PCDATA | b )*>
The spurious comment disappears.
Can someone explain what is going on here?
I also checked against JDK 9.0.4 where all comments are being copied, so I assume that I might be doing something entirely wrong.
I can confirm this happening on JDK 1.8.0_151, and consider it a problem due to using SAXSource
as input source for transformation, because Java's javax.xml.parsers.SAXParser
ignores comments .
The following variant using StAX doesn't print spurious comments on JDK 1.8 so might help in achieving to get uniform Java source running on both JDK 1.8 and 1.9:
import java.io.*;
import javax.xml.stream.*;
import javax.xml.transform.*;
import javax.xml.transform.stax.*;
import javax.xml.transform.stream.*;
public class FormatBugUsingStaX {
public static void main(String[] args) throws Exception {
InputStream inputStream = new FileInputStream(args[0]);
InputStreamReader in = new InputStreamReader(inputStream);
XMLInputFactory factory = XMLInputFactory.newInstance();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
XMLStreamReader streamReader = factory.createXMLStreamReader(in);
Writer out = new FileWriter(args[1]);
t.transform(new StAXSource(streamReader), new StreamResult(out));
}
}
Edit: If your intention is to keep comments, you might have luck by using another StAX implementation; cf. Transforming a StAX Source in Java
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.