简体   繁体   中英

Java XML Prettyprinting incorporates DTD Comments?

When parsing XML Data with the builtin Java (tested with jdk 8u151 and 8u161) XML processing engine I get strange results. If I am using parametric entityrefs in a DTD all following SGML Comments from the DTD end up in the output document.

This is the (minimal) code I am running:

import java.io.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.InputSource;

public class FormatBug {

    public static void main( String[] args ) throws Exception {
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        Reader in = new FileReader( args[0] );
        Writer out = new FileWriter( args[1] );
        t.transform( new SAXSource( new InputSource(in) ), new StreamResult(out) );
        out.flush();
        out.close();
    }
}

The Source document looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc SYSTEM "doc.dtd">
<doc><p>This is a <b>bold</b> line.</p></doc>

The DTD (doc.dtd) looks like follows:

<!ELEMENT doc (p+)>
<!ENTITY % floats "b" >
<!-- comment before -->
<!ELEMENT p ( #PCDATA | %floats; )*>
<!-- comment after -->
<!ELEMENT b (#PCDATA)>

The result looks like this:

<!-- comment after --><!DOCTYPE doc SYSTEM "doc.dtd">
<doc><p>This is a <b>bold</b> line.</p></doc>

When replaceing the rule for p into

<!ELEMENT p ( #PCDATA | b )*>

The spurious comment disappears.

Can someone explain what is going on here?

I also checked against JDK 9.0.4 where all comments are being copied, so I assume that I might be doing something entirely wrong.

I can confirm this happening on JDK 1.8.0_151, and consider it a problem due to using SAXSource as input source for transformation, because Java's javax.xml.parsers.SAXParser ignores comments .

The following variant using StAX doesn't print spurious comments on JDK 1.8 so might help in achieving to get uniform Java source running on both JDK 1.8 and 1.9:

import java.io.*;
import javax.xml.stream.*;
import javax.xml.transform.*;
import javax.xml.transform.stax.*;
import javax.xml.transform.stream.*;

public class FormatBugUsingStaX {

    public static void main(String[] args) throws Exception {

        InputStream inputStream = new FileInputStream(args[0]);
        InputStreamReader in = new InputStreamReader(inputStream);
        XMLInputFactory factory = XMLInputFactory.newInstance();
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        XMLStreamReader streamReader = factory.createXMLStreamReader(in);
        Writer out = new FileWriter(args[1]);
        t.transform(new StAXSource(streamReader), new StreamResult(out));
    }
}

Edit: If your intention is to keep comments, you might have luck by using another StAX implementation; cf. Transforming a StAX Source in Java

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM