[英]How to avoid reading of DTD when parsing XML file in Java?
I need to parse XML document, which starts with following lines: 我需要解析XML文档,该文档以以下几行开头:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">
<pdf2xml producer="poppler" version="0.22.0">
<page number="1" position="absolute" top="0" left="0" height="1263" width="892">
<fontspec id="0" size="12" family="Times" color="#000000"/>
I read it using following code: 我使用以下代码阅读:
final DocumentBuilder builder;
DocumentBuilderFactory builderFactory =
DocumentBuilderFactory.newInstance();
builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(
new FileInputStream(aXmlFileName));
The last call fails with following exception: 最后一次呼叫失败,但以下异常:
Exception in thread "main" java.io.FileNotFoundException: D:\dev\ro-2014-04-13-01\pdf2xml.dtd
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:613)
File pdf2xml.dtd
actually doesn't exist in the specified directory. pdf2xml.dtd
文件实际上不在指定目录中。
How can I modify the code so that the document is parsed despite the absence of pdf2xml.dtd
? 我如何修改代码,以便尽管没有
pdf2xml.dtd
也可以对文档进行解析?
You need to use Entity Resolver
您需要使用
Entity Resolver
myBuilder.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if (systemId.contains("pdf2xml.dtd")) {
return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
} else
return null;
}
});
when the parser reaches the condition - "pdf2xml.dtd", the entity resolver is called, which returns an empty XML doc. 当解析器达到条件-“ pdf2xml.dtd”时,将调用实体解析器,该解析器返回一个空的XML文档。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.