简体   繁体   中英

Validate XML against multiple arbitrary schemas

Consider an XML document that starts like the following with multiple schemas (this is NOT a Spring-specific question; this is just a convenient XML doc for the example):

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:jaxrs="http://cxf.apache.org/jaxrs"
       xmlns:osgi="http://www.springframework.org/schema/osgi"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
               http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
           http://cxf.apache.org/jaxrs
               http://cxf.apache.org/schemas/jaxrs.xsd
           http://www.springframework.org/schema/osgi
               http://www.springframework.org/schema/osgi/spring-osgi.xsd">

I want to validate the document, but I don't know in advance which namespaces the document author will use. I trust the document author, so I'm willing to download arbitrary schema URLs. How do I implement my validator?

I know that I can specify my schemas with a DocumentBuilderFactory instance my calling setAttribute("http://java.sun.com/xml/jaxp/properties/schemaSource", new String[] {...}) but I don't know the schema URLs until the document is parsed.

Of course, I could extract the XSD URLs myself after parsing the document and then running it through the validator specifying the "http://java.sun.com/xml/jaxp/properties/schemaSource" as above, but surely there's already an implementation that does that automatically?

I haven't confirmed this but you might find Use JAXP Validation API to create a validator and validate input from a DOM which contains inline schemas and multiple validation roots useful.

In particular,

factory.setFeature(SCHEMA_FULL_CHECKING_FEATURE_ID, schemaFullChecking);

factory.setFeature(HONOUR_ALL_SCHEMA_LOCATIONS_ID, honourAllSchemaLocations);

Forgive me for answering my own question... The other answers from @Eugene Yokota and @forty-two were VERY helpful, but I thought they were not complete enough to accept. I needed to do additional work to compose the suggestions into the final solution below. The following works perfectly under JDK 1.6. It does not have sufficient error checking (see the link in Eugene's answer that is a very complete solution -- but is not reusable) nor does it cache the downloaded XSDs, I believe. I think it exploits specific features of the Xerces parser, because of the apache.org feature URLs.

    InputStream xmlStream = ...

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setNamespaceAware(true);
    factory.setValidating(true);
    factory.setXIncludeAware(true);
    factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema");
    factory.setFeature("http://apache.org/xml/features/validation/schema-full-checking", true);
    factory.setFeature("http://apache.org/xml/features/honour-all-schemaLocations", true);
    factory.setFeature("http://apache.org/xml/features/validate-annotations", true);
    factory.setFeature("http://apache.org/xml/features/generate-synthetic-annotations", true);

    DocumentBuilder builder = factory.newDocumentBuilder();
    builder.setErrorHandler(new ErrorHandler() {
        public void warning(SAXParseException exception) throws SAXException {
            LOG.log(Level.WARNING, "parse warn: " + exception, exception);
        }
        public void error(SAXParseException exception) throws SAXException {
            LOG.log(Level.SEVERE, "parse error: " + exception, exception);
        }
        public void fatalError(SAXParseException exception) throws SAXException {
            LOG.log(Level.SEVERE, "parse fatal: " + exception, exception);
        }
    });

    Document doc = builder.parse(xmlStream);

If you create a DocumentBuilderFactory like so:

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(true);
    dbf.setNamespaceAware(true);
    dbf.setAttribute(
            "http://java.sun.com/xml/jaxp/properties/schemaLanguage",
            "http://www.w3.org/2001/XMLSchema");

You can then set an EntityResolver on the DocumentBuilder instances created by this factory to get a chance to resolve the schema locations referred to in the directives. The specified location will be present in the systemId argument.

I thought the builder would do this automatically, without specifying a resolver, but obviously not out of the box. May be it is controlled by another feature, attribute or property?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM