Imagine you have an XML document and imagine you have the DTD but the document itself doesn't actually specify a DOCTYPE
... How would you insert the DOCTYPE
declaration, preferably by specifying it on the parser (similar to how you can set the schema for a document that will be parsed) or by inserting the necessary SAX events via an XMLFilter
or the like?
I've found many references to EntityResolver
, but that is what's invoked once a DOCTYPE
is found during parsing and it's used to point to a local DTD file. EntityResolver2
appears to have what I'm looking for but I haven't found any examples of usage.
This is the closest I've come thus far: (code is Groovy, but close enough that you should be able to understand it...)
import org.xml.sax.*
import org.xml.sax.ext.*
import org.xml.sax.helpers.*
class XmlFilter extends XMLFilterImpl {
public XmlFilter( XMLReader reader ) { super(reader) }
@Override public void startDocument() {
super.startDocument()
super.resolveEntity( null,
'file:///./entity.dtd')
println "filter startDocument"
}
}
class MyHandler extends DefaultHandler2 {
public InputSource resolveEntity(String name, String publicId, String baseURI, String systemId) {
println "entity: $name, $publicId, $baseURI, $systemId"
return new InputSource(new StringReader('<!ENTITY asdf "¡">'))
}
}
def handler = new MyHandler()
def parser = XMLReaderFactory.createXMLReader()
parser.setFeature 'http://xml.org/sax/features/use-entity-resolver2', true
def filter = new XmlFilter( parser )
filter.setContentHandler( handler )
filter.setEntityResolver( handler )
filter.parse( new InputSource(new StringReader('''<?xml version="1.0" ?>
<test>one &asdf; two! ¡£¢</test>''')) );
I see resolveEntity
called but still hit
org.xml.sax.SAXParseException: The entity "asdf" was referenced, but not declared.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:333)
I guess this is because there's no way to add SAX events that the parser knows about, I can only add events via a filter that's upstream from the parser which are passed along to the ContentHandler. So the document has to be valid going into the XMLReader. Any way around this? I know I can modify the raw stream to add a doctype or possibly do a transform to set a DTD... Any other options?
You can try DoctypeChanger which modifies the raw stream as you suggested:
DoctypeChanger is a Java class that lets you add, modify or remove a DOCTYPE declaration from a byte stream as it is fed into an XML parser.
InputStream in = ... // get your XML InputStream
DOCTYPEChangerStream changer = new DOCTYPEChangerStream(in);
changer.setGenerator(
new DoctypeGenerator() {
public Doctype generate(Doctype old) {
return new DoctypeImpl("rootElement", "pubId", "sysId", "internalSubset");
}
}
);
// .. and pass it on to the parser.
I would use an xslt stylesheet to do an identity transform and use the xsl:output
element along with the doctype-system
attribute (and the doctype-public
if I wanted to add a public identifier).
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output doctype-system="test.dtd"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.