簡體   English   中英

在Java中解析XML文件時,如何避免讀取DTD?

[英]How to avoid reading of DTD when parsing XML file in Java?

我需要解析XML文檔,該文檔以以下幾行開頭:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

<pdf2xml producer="poppler" version="0.22.0">
<page number="1" position="absolute" top="0" left="0" height="1263" width="892">
    <fontspec id="0" size="12" family="Times" color="#000000"/>

我使用以下代碼閱讀:

    final DocumentBuilder builder;
    DocumentBuilderFactory builderFactory =
            DocumentBuilderFactory.newInstance();

    builder = builderFactory.newDocumentBuilder();

    Document document = builder.parse(
            new FileInputStream(aXmlFileName));

最后一次呼叫失敗,但以下異常:

Exception in thread "main" java.io.FileNotFoundException: D:\dev\ro-2014-04-13-01\pdf2xml.dtd
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:146)
    at java.io.FileInputStream.<init>(FileInputStream.java:101)
    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:613)

pdf2xml.dtd文件實際上不在指定目錄中。

我如何修改代碼,以便盡管沒有pdf2xml.dtd也可以對文檔進行解析?

您需要使用Entity Resolver

 myBuilder.setEntityResolver(new EntityResolver() {
    @Override
    public InputSource resolveEntity(String publicId, String systemId)
            throws SAXException, IOException {
        if (systemId.contains("pdf2xml.dtd")) {
            return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
        } else
           return null;
    }
});

當解析器達到條件-“ pdf2xml.dtd”時,將調用實體解析器,該解析器返回一個空的XML文檔。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM