简体繁体 English

针对XSD验证多个XML文件的最佳方法是什么？

[英]What is the best way to validate multiple XML files against a XSD?

原文 2018-04-29 03:47:57 9 1 java/ xml/ validation/ xsd

I am working on a project that requires the validation of many XML files against their XSD, the trouble I am having is that many of the XSD files depend on others XSDs , making the usual validation kind of troublesome, is there an elegant way to resolve this issue? 我正在一个需要针对其XSD验证许多XML文件的项目，我遇到的麻烦是，许多XSD文件都依赖于其他XSD ，这使得通常的验证有点麻烦，是否有一种优雅的解决方法这个问题？

I would prefer if possible to work with those files in memory, the files are not in a concise directory structure that conforms with their importation paths. 如果可能的话，我希望使用内存中的那些文件，这些文件的目录结构不符合其导入路径。

Just to note I am working with the Java language. 请注意，我正在使用Java语言。

1 个解决方案

Assuming here that you work with JAXP, so that you can setSchema() on either SAXParserFactory or `DocumentBuilderFactory. 假设您使用的是JAXP，则可以在SAXParserFactory或DocumentBuilderFactory上setSchema() 。

One solution I was part of, was to read all XSD sources into an aggregated Schema object using SchemaFactory.newSchema(Source[] schemas) . 我参与其中的一种解决方案是使用SchemaFactory.newSchema(Source[] schemas)将所有XSD源读取到聚合的Schema对象中。 This aggregated Schema was then able to validate any XML document that referenced any "top" schema; 然后，该聚合的模式能够验证引用任何“顶部”模式的任何XML文档。 all imported schemas had to be part of the aggregated schema. 所有imported架构都必须是聚合架构的一部分。 As I remember it, it was necessary to order the Source array by dependency, so that if Schema A imported Schema B, Schema B had to occur befor Schema A in the array. 我记得，有必要按依赖关系对Source数组进行排序，这样，如果Schema A导入了Schema B，则Schema B必须在数组中的Schema A之前发生。

Also, as I recall, <include> didn't work very well with this mechanism. 另外，正如我记得的那样， <include>在这种机制下不能很好地工作。

Another solution would be to set an LSResourceResolver on the ShemaFactory. 另一个解决方案是在ShemaFactory上设置LSResourceResolver 。 You would have to implement your own LSResourceresolver that serves byte- or character streams based on the input to the resolver. 您将必须实现自己的LSResourceresolver ，它根据解析器的输入来提供字节流或字符流。 I haven't personally used or researched this solution. 我尚未亲自使用或研究此解决方案。

The first solution has of course the benefit that schema parsing and processing can be done once and reused for all validations that follows; 第一个解决方案当然具有这样的好处，即模式解析和处理可以一次完成，并可以重复用于随后的所有验证。 something that will probably be difficult to achieve with the second option. 第二种选择可能很难实现。

Another thing to keep in mind (depending on your context): It is a good design choice to control the whole "resolving" process (ie control how the parsers get access to external resources), from a performance as well as a security perspective. 要记住的另一件事（取决于您的上下文）：从性能和安全性角度来看，控制整个“解析”过程（即控制解析器如何访问外部资源）是一个不错的设计选择。