简体   繁体   中英

Java change and move non-standard XML file

I am using a third party application and would like to change one of its files. The file is stored in XML but with an invalid doctype.

When I try to read use a it errors out becuase the doctype contains "file:///ReportWiz.dtd" (as shown, with quotes) and I get an exception for cannot find file. Is there a way to tell the docbuilder to ignore this? I have tried setValidate to false and setNamespaceAware to false for the DocumentBuilderFactory.

The only solutions I can think of are

  • copy file line by line into a new file, omitting the offending line, doing what i need to do, then copying into another new file and inserting the offending line back in, or
  • doing mostly the same above but working with a FileStream of some sort (though I am not clear on how I could do this..help?)
DocumentBuilderFactory docFactory = DocumentBuilderFactory
                    .newInstance();
docFactory.setValidating(false);
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(file);

Tell your DocumentBuilderFactory to ignore the DTD declaration like this:

docFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

See here for a list of available features.

You also might find JDOM a lot easier to work with than org.w3c.dom:

org.jdom.input.SAXBuilder builder = new SAXBuilder();
builder.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
org.jdom.Document doc = builder.build(file);

Handle resolution of the DTD manually, either by returning a copy of the DTD file (loaded from the classpath) or by returning an empty one. You can do this by setting an entity resolver on your document builder:

    EntityResolver er = new EntityResolver() {
        @Override
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            if ("file:///ReportWiz.dtd".equals(systemId)) {
                System.out.println(systemId);
                InputStream zeroData = new ByteArrayInputStream(new byte[0]);
                return new InputSource(zeroData);
            }
            return null;
        }
    };

My first thought was dealing with it as a stream. You could make a new adapter at some level and just copy input to output except for the offending text.

If the file is shortish (under half a gig or so) you could also read the entire thing into a byte array and make your modifications there, then create a new stream from the byte array into your builder.

That's the advantage of the amazingly bulky way Java handles streams, you actually have a lot of flexibility.

如果你不想承担解析器Xerces的,并希望通用的解决方案看这个

我要讨论的另一件事是将所有文件存储在一个字符串中,然后进行操作并将String连接到文件中。这些似乎都不干净或不容易,但是最好的方法是什么?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM