简体   繁体   English

如何“规范化”任意xml(通过重新排序所有属性和元素)

[英]how to “canonicalize” arbitrary xml (by reordering all attributes and elements)

i have some code that generates an *.xsd file from a set of jaxb-annotated classes: 我有一些代码从一组jaxb注释类生成* .xsd文件:

JAXBContext context = //build from set of classes
final DOMResult result = new DOMResult(); //will hold xsd output
context.generateSchema(new SchemaOutputResolver() {
    @Override
    public Result createOutput(String namespaceUri, String suggestedFileName) throws IOException {
       return result;
    }
});
Document doc = result.getNode();
OutputFormat format = new OutputFormat(doc);
format.setIndenting(true);
StringWriter writer = new StringWriter();
XMLSerializer serializer = new XMLSerializer(writer, format);
serializer.serialize(doc);
String xsd = writer.toString();

the problem is that the xsd produces (stored in xsd) is in random order - 2 runs with the same input might generate logically-identical xsds but in different element order, which plays havoc with diff tools when its written out to file. 问题是xsd产生(存储在xsd中)是随机顺序 - 使用相同输入的2次运行可能会生成逻辑上相同的xsds,但是以不同的元素顺序,当它写出文件时会对diff工具造成严重破坏。

how to i "canonicalize" the xml inside xsd ? 如何在xsd “规范化”xml?

i've seen some other references to xslt in related questions but anything i saw required listing the elements in advance. 我在相关问题中看到了对xslt的一些其他引用,但我看到的任何事情都需要提前列出元素。 im looking for something that works on any xml input. 我正在寻找适用于任何xml输入的东西。

There is no public spec I'm aware of that attempts to specify a canonical form for XSD schema documents. 没有公共规范我知道尝试为XSD架构文档指定规范形式。 So there won't be off-the-shelf tools; 所以不会有现成的工具; you must either roll your own or decide (as Mathias Müller suggests) that diff is not your friend here. 你必须自己动手或决定(正如MathiasMüller所说)差异不是你的朋友。

Note that off-the-shelf canonicalization tools may normalize the order of attribute-value specifications in the input document, but they will never attempt to normalize the sequence of elements, since in the general case sequence of elements is significant in XML. 请注意,现成的规范化工具可以规范化输入文档中属性值规范的顺序,但是它们永远不会尝试规范化元素序列,因为在一般情况下,元素序列在XML中很重要。

When I've been in this situation, I've specified a 'canonical' form that would minimize headaches for me (list all top-level elements in alpha order, then all top-level complex types in alpha order, then all top-level simple types in alpha order, ...) and written an XSLT stylesheet to sort the elements appropriately. 当我遇到这种情况时,我已经指定了一个“规范”形式,可以最大限度地减少我的头痛(以alpha顺序列出所有顶级元素,然后按照alpha顺序列出所有顶级复杂类型,然后是所有顶级元素 - 以alpha顺序级别简单类型,...)并编写一个XSLT样式表来适当地对元素进行排序。

If that suffices for your purposes (that is, if it's the sequence of top-level constructs that's causing your problems), it's easy enough to do (assuming you have enough knowledge of XSLT to write a near-identity transform that sorts the top-level declarations, or can write an equivalent transformation in some other technology). 如果这足以满足您的目的(也就是说,如果它是导致您出现问题的顶层构造的序列),那么这很容易做到(假设您有足够的XSLT知识来编写一个近似身份转换,可以对顶部进行排序 - 级别声明,或者可以在其他技术中编写等效转换)。

If the schema generation is also inconsistent regarding the structure of the individual declarations, then you may also need to normalize declaration structure (sort the children of xsd:choice alphabetically, sort attribute references and declarations alphabetically or by type or however you like, normalize model group structures, ...). 如果模式生成在各个声明的结构上也不一致,那么您可能还需要规范化声明结构(按字母顺序对xsd:choice的子项进行排序,按字母顺序或按类型排序属性引用和声明,或者您喜欢,对模型进行规范化小组结构,......)。 Depending on how exuberantly your schema generator varies its output, this can in theory become arbitrarily complicated. 根据您的模式生成器如何改变它的输出,这在理论上可以变得任意复杂。 But in practice, I expect that the problem will be tractable for anyone with adequate knowledge of XSD and XSLT (or some other XML processing technology). 但在实践中,我希望这个问题对于任何对XSD和XSLT(或其他一些XML处理技术)有足够了解的人来说都是易于理解的。

You will also, of course, have to take steps to get the line breaks and whitespace in the schema documents under control; 当然,您还必须采取措施来控制模式文档中的换行符和空格; the XSLT serialization controls for indenting output should help you here. 缩进输出的XSLT序列化控件应该可以帮到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM