简体   繁体   English

验证时的DTD信息和相关错误(XSD架构)-是否可以忽略它们?

[英]DTD Info and Related Errors when Validating (XSD Schema) — Can They Be Ignored?

So I've got a large amount of XML files. 因此,我有大量的XML文件。 For years they've caused trouble because the people that write them do them by hand, so errors naturally occurred. 多年来,它们已经引起麻烦,因为编写它们的人都是手工做的,因此自然会发生错误。 It's high time we get around to validating them and providing feedback on what's wrong when trying to use these XML files. 现在该轮到我们来验证它们并提供有关尝试使用这些XML文件时出了什么问题的反馈。

I'm using the SAX parser and getting a list of errors. 我正在使用SAX解析器并获取错误列表。

Below is my code 以下是我的代码

  BookValidationErrorHandler errorHandler = new BookValidationErrorHandler();

        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setValidating(true);
        factory.setNamespaceAware(true);

        SchemaFactory schemaFactory = 
            SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

        factory.setSchema(schemaFactory.newSchema(
            new Source[] {new StreamSource("test.xsd")}));


        javax.xml.parsers.SAXParser parser = factory.newSAXParser();
        org.xml.sax.XMLReader reader = parser.getXMLReader();

        reader.setErrorHandler(errorHandler);
        reader.parse(new InputSource("bad.xml"));

The first couple errors are always: 前几个错误始终是:

Line Number: 2: Document is invalid: no grammar found. 行号:2:文档无效:未找到语法。 Line Number: 2: Document root element "credits", must match DOCTYPE root "null". 行号:2:文档根元素“ credits”,必须与DOCTYPE根“ null”匹配。

We can't possibly go and edit these thousands of XML files that needed to be checked. 我们不可能去编辑这数千个需要检查的XML文件。

Is there anything I can easily add to the front of the source to prevent this? 有什么我可以轻松添加到源代码的前面来防止这种情况的吗? Is there a way to tell the parser to ignore these DTD related errors? 有没有办法告诉解析器忽略这些与DTD相关的错误? Not even sure what the grammar one means. 甚至不知道语法是什么意思。 I sort of understand what the second one means. 我有点理解第二个含义。

Setting setValidating(true) requests DTD validation and causes a failure if no DTD exists. 设置setValidating(true)请求DTD验证,如果不存在DTD,则会导致失败。 If you only want schema validation and not DTD validation then use setValidating(false) . 如果只希望模式验证而不是DTD验证,请使用setValidating(false) From the Javadoc for setValidating() : Java文档中获取setValidating()

To use modern schema languages such as W3C XML Schema or RELAX NG instead of DTD, you can configure your parser to be a non-validating parser by leaving the setValidating(boolean) method false, then use the setSchema(Schema) method to associate a schema to a parser. 要使用W3C XML Schema或RELAX NG等现代模式语言而不是DTD,可以通过将setValidating(boolean)方法设置为false,然后将setSchema(Schema)方法关联为一个解析器的架构。

You can still use a validating parser and you don't need to preset the schema in the parser, if you are using a JAXP-compliant parser and you configure it correctly as per the Oracle documentation : 如果您正在使用兼容JAXP的解析器,并且根据Oracle文档正确配置了它,那么您仍然可以使用验证解析器,并且不需要在解析器中预设模式:

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(true);
SAXParser saxParser = spf.newSAXParser();
// Important step next:  Tell the parser which XML schema-definition language to expect:
saxParser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema");
// Now when we parse a file without a DTD, we no longer get an error 
// (as long as an XSD schema is defined in the file):
saxParser.parse(source, handler);

in these dais I had the same problem; 在这些雏菊中,我遇到了同样的问题; I found this thread looking for a solution. 我发现此线程正在寻找解决方案。 My solution was to use an EntityResolver. 我的解决方案是使用EntityResolver。 Seems like set the Schema is not enought... not for me at least. 似乎设置Schema还不够...至少对我来说不是。 This is an EntityResolver example: 这是一个EntityResolver示例:

public class CustomResolver implements EntityResolver {
    @Override
    public InputSource resolveEntity(String publicId, String systemId) 
            throws SAXException, IOException {

        if (systemId.equals("http://namespace1.example.com/ex1")) {
            return new InputSource("xsd_for_namespace1_path"));
        } else if (systemId.equals("http://namespace2.example.com/ex2")) {
            return new InputSource("xsd_for_namespace2_path"));
        } else if (systemId.equals("http://namespace3.example.com/ex3")) {
            return new InputSource("xsd_for_namespace3_path")); 
        }

        return null;
    }
}

I disable the setValidating() property too. 我也禁用了setValidating()属性。 This is my parser configuration: 这是我的解析器配置:

SAXParserFactory saxpf = SAXParserFactory.newInstance();
saxpf.setNamespaceAware(true);
saxpf.setSchema(getSchema());
saxpf.setValidating(false);
SAXParser saxParser = saxpf.newSAXParser();
saxParser.getParser().setEntityResolver(new XSDResolver());

The method getSchema() instantiate a Schema like you do in your code but with more sources. 方法getSchema()像在代码中一样实例化一个Schema,但是具有更多的源。

I hope that it can help who found that same error. 我希望它可以帮助发现相同错误的人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM