Java，xml，XSLT：防止DTD验证

Question

I use the Java (6) XML-Api to apply a xslt transformation on a html-document from the web. 我使用Java（6）XML-Api对来自Web的html文档应用xslt转换。 This document is wellformed xhtml and so contains a valid DTD-Spec ( <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> ). 这个文档格式正确，因此包含有效的DTD-Spec（ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> ）。 Now a problem occurs: Uppon transformation the XSLT-Processor tries to download the DTD and the w3-server denies this by a HTTP 503 error (due to Bandwith Limitation by w3). 现在出现问题：Uppon转换XSLT-Processor尝试下载DTD并且w3-server通过HTTP 503错误拒绝这一点（由于w3的Bandwith限制）。

How can I prevent the XSLT-Processor from downloading the dtd? 如何防止XSLT-Processor下载dtd？ I dont need my input-document validated. 我不需要我的输入文档验证。

Source is: 来源是：

import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

-- -

   String xslt = "<?xml version=\"1.0\"?>"+
   "<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">"+
   "    <xsl:output method=\"text\" />"+          
   "    <xsl:template match=\"//html/body//div[@id='bodyContent']/p[1]\"> "+
   "        <xsl:value-of select=\".\" />"+
   "     </xsl:template>"+
   "     <xsl:template match=\"text()\" />"+
   "</xsl:stylesheet>";

   try {
   Source xmlSource = new StreamSource("http://de.wikipedia.org/wiki/Right_Livelihood_Award");
   Source xsltSource = new StreamSource(new StringReader(xslt));
   TransformerFactory ft = TransformerFactory.newInstance();

   Transformer trans = ft.newTransformer(xsltSource);

   trans.transform(xmlSource, new StreamResult(System.out));
   }
   catch (Exception e) {
     e.printStackTrace();
   }

I read the following quesitons here on SO, but they all use another XML-Api: 我在这里阅读了以下问题，但它们都使用了另一个XML-Api：

"DTD download error while parsing XHTML document in XOM" “在XOM中解析XHTML文档时出现DTD下载错误”

Thanks! 谢谢！

Answer 1

I recently had this issue while unmarshalling XML using JAXB. 我最近在使用JAXB解组XML时遇到了这个问题。 The answer was to create a SAXSource from an XmlReader and InputSource, then pass that to the JAXB UnMarshaller's unmarshal() method. 答案是从XmlReader和InputSource创建一个SAXSource，然后将其传递给JAXB UnMarshaller的unmarshal（）方法。 To avoid loading the external DTD, I set a custom EntityResolver on the XmlReader. 为了避免加载外部DTD，我在XmlReader上设置了一个自定义EntityResolver。

SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xmlr = sp.getXMLReader();
xmlr.setEntityResolver(new EntityResolver() {
    public InputSource resolveEntity(String pid, String sid) throws SAXException {
        if (sid.equals("your remote dtd url here"))
            return new InputSource(new StringReader("actual contents of remote dtd"));
        throw new SAXException("unable to resolve remote entity, sid = " + sid);
    } } );
SAXSource ss = new SAXSource(xmlr, myInputSource);

As written, this custom entity resolver will throw an exception if it's ever asked to resolve an entity OTHER than the one you want it to resolve. 如上所述，如果要求解析实体以外的其他实体，则该自定义实体解析程序将抛出异常，而不是您希望它解析的实体。 If you just want it to go ahead and load the remote entity, remove the "throws" line. 如果您只是希望它继续并加载远程实体，请删除“throws”行。

Answer 2

Try setting a feature in your DocumentBuilderFactory: 尝试在DocumentBuilderFactory中设置一个功能：

URL url = new URL(urlString);
InputStream is = url.openStream();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db;
db = dbf.newDocumentBuilder();
Document result = db.parse(is);

Right now I'm experiencing the same problems inside XSLT(2) when calling the document function to analyse external XHTML-pages. 现在，当调用文档函数来分析外部XHTML页面时，我在XSLT（2）中遇到了同样的问题。

Answer 3

The previous answers led me to a solution but is wasn't obvious for me so here is a complete one: 以前的答案让我找到了解决方案，但对我来说并不明显，所以这里有一个完整的答案：

private void convert(InputStream xsltInputStream, InputStream srcInputStream, OutputStream destOutputStream) throws SAXException, ParserConfigurationException,
        TransformerFactoryConfigurationError, TransformerException, IOException {
    //create a parser with a fake entity resolver to disable DTD download and validation
    XMLReader xmlReader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
    xmlReader.setEntityResolver(new EntityResolver() {
        public InputSource resolveEntity(String pid, String sid) throws SAXException {
            return new InputSource(new ByteArrayInputStream(new byte[] {}));
        }
    });
    //create the transformer
    Source xsltSource = new StreamSource(xsltInputStream);
    Transformer transformer = TransformerFactory.newInstance().newTransformer(xsltSource);
    //create the source for the XML document which uses the reader with fake entity resolver
    Source xmlSource = new SAXSource(xmlReader, new InputSource(srcInputStream));
    transformer.transform(xmlSource, new StreamResult(destOutputStream));
}

Answer 4

if you use 如果你使用

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

you can try disable the dtd validation with the fllowing code: 您可以尝试使用fllowing代码禁用dtd验证：

 dbf.setValidating(false);

Answer 5

You need to be using javax.xml.parsers.DocumentBuilderFactory 您需要使用javax.xml.parsers.DocumentBuilderFactory

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource src = new InputSource("http://de.wikipedia.org/wiki/Right_Livelihood_Award")
Document xmlDocument = builder.parse(src.getByteStream());
DOMSource source = new DOMSource(xmlDocument);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer(xsltSource);
transformer.transform(source, new StreamResult(System.out));

Java，xml，XSLT：防止DTD验证

问题描述

5 个解决方案

解决方案1
5 已采纳 2009-10-15 16:59:48

解决方案2
3 2009-11-11 10:09:44

解决方案3
2 2013-11-19 12:55:43

解决方案4
0 2011-03-21 04:03:32

解决方案5
-1 2009-10-15 16:06:29

Java，xml，XSLT：防止DTD验证

问题描述

5 个解决方案

解决方案1 5 已采纳 2009-10-15 16:59:48

解决方案2 3 2009-11-11 10:09:44

解决方案3 2 2013-11-19 12:55:43

解决方案4 0 2011-03-21 04:03:32

解决方案5 -1 2009-10-15 16:06:29

解决方案1
5 已采纳 2009-10-15 16:59:48

解决方案2
3 2009-11-11 10:09:44

解决方案3
2 2013-11-19 12:55:43

解决方案4
0 2011-03-21 04:03:32

解决方案5
-1 2009-10-15 16:06:29