简体   繁体   English

使用模式按照模式重新排序XML文档的元素

[英]Using a schema to reorder the elements of an XML document in conformance with the schema

Say I have an XML document (represented as text, a W3C DOM, whatever), and also an XML Schema. 假设我有一个XML文档(表示为文本,W3C DOM,无论如何),还有一个XML Schema。 The XML document has all the right elements as defined by the schema, but in the wrong order. XML文档具有模式定义的所有正确元素,但顺序错误。

How do I use the schema to "re-order" the elements in the document to conform to the ordering defined by the schema? 如何使用模式“重新排序”文档中的元素以符合模式定义的顺序?

I know that this should be possible, probably using XSOM , since the JAXB XJC code generator annotates its generated classes with the correct serialization order of the elements. 我知道这应该是可能的,可能使用XSOM ,因为JAXB XJC代码生成器使用元素的正确序列化顺序来注释其生成的类。

However, I'm not familiar with the XSOM API, and it's pretty dense, so I'm hoping one of you lot has some experience with it, and can point me in the right direction. 但是,我不熟悉XSOM API,它非常密集,所以我希望你们中的一个人有一些经验,可以指出我正确的方向。 Something like "what child elements are permitted inside this parent element, and in what order?" 像“在这个父元素中允许哪些子元素,以什么顺序?”之类的东西。


Let me give an example. 让我举个例子。

I have an XML document like this: 我有一个像这样的XML文档:

<A>
   <Y/>
   <X/>
</A>

I have an XML Schema which says that the contents of <A> must be an <X> followed by a <Y> . 我有一个XML Schema,它说<A>的内容必须是<X>然后是<Y> Now clearly, if I try to validate the document against the schema, it fails, since the <X> and <Y> are in the wrong order. 现在很明显,如果我尝试根据模式验证文档,它会失败,因为<X><Y>的顺序错误。 But I know my document is "wrong" in advance, so I'm not using the schema to validate just yet. 但我知道我的文档提前是“错误的”,所以我还没有使用模式进行验证。 However, I do know that my document has all of the correct elements as defined by the schema, just in the wrong order. 但是,我知道 ,我的文档具有所有正确的元素由模式只是在错误的顺序定义。

What I want to do is to programmatically examine the Schema (probably using XSOM - which is an object model for XML Schema), and ask it what the contents of <A> should be. 我想要做的是以编程方式检查Schema(可能使用XSOM - 这是XML Schema的对象模型),并询问它应该是什么内容<A> The API will expose the information that "you need an <X> followed by a <Y> ". API将公开“您需要<X>后跟<Y> ”的信息。

So I take my XML document (using a DOM API) and re-arrange and accordingly, so that now the document will validate against the schema. 所以我使用我的XML文档(使用DOM API)并相应地重新安排,以便现在文档将根据模式进行验证。

It's important to understand what XSOM is here - it's a java API which represents the information contained in an XML Schema, not the information contained in my instance document. 了解XSOM在这里是什么很重要 - 它是一个java API,它表示XML Schema中包含的信息, 而不是我的实例文档中包含的信息。

What I don't want to do is generate code from the schema, since the schema is unknown at build time. 我不想做的是从架构生成代码,因为架构在构建时是未知的。 Furthermore, XSLT is no use, since the correct ordering of the elements is determined solely by the data dictionary contained in the schema. 此外,XSLT没有用,因为元素的正确排序仅由模式中包含的数据字典决定。

Hopefully that's now explicit enough. 希望现在已经足够明确了。

I don't have a good answer to this yet, but I have to note that there is potential for ambiguity there. 我对此还没有一个好的答案,但我必须指出那里有可能存在歧义。 Consider this schema: 考虑这个架构:

<xs:element name="root">
  <xs:choice>
    <xs:sequence>
      <xs:element name="foo"/>
      <xs:element name="bar">
        <xs:element name="dee">
        <xs:element name="dum">
      </xs:element>
    </xs:sequence>
    <xs:sequence>
      <xs:element name="bar">
        <xs:element name="dum">
        <xs:element name="dee">
      </xs:element>
      <xs:element name="foo"/>
    </xs:sequence>
  </xs:choice>
</xs:element>

and this input XML: 这个输入XML:

<root>
  <foo/>
  <bar>
    <dum/>
    <dee/>
  </bar>
</root>

This could be made to comply with the schema either by reordering <foo> and <bar> , or by reordering <dee> and <dum> . 这可以通过重新排序<foo><bar> ,或通过重新排序<dee><dum>来符合模式。 There doesn't seem to be any reason to prefer one over another. 似乎没有任何理由偏爱一个而不是另一个。

I was stuck with the same problem for around two weeks. 大约两个星期我遇到了同样的问题。 Finally I got the breakthrough. 最后我获得了突破。 This can be achieved using JAXB marshalling/unmarshalling feature. 这可以使用JAXB编组/解组功能来实现。

In JAXB marshal/unmarshal, XML validation is an optional feature. 在JAXB marshal / unmarshal中,XML验证是一项可选功能。 So while creating Marshaller and UnMarshaller objects, we do not call setSchema(schema) method. 因此,在创建Marshaller和UnMarshaller对象时,我们不会调用setSchema(schema)方法。 Omitting this step avoids XML validation feature of marshal/unmarshal. 省略此步骤可避免marshal / unmarshal的XML验证功能。

So now, 所以现在,

  1. If any mandatory element as per XSD is not present in XML, it is overlooked. 如果XML中不存在XSD中的任何必需元素,则会被忽略。
  2. If any tag not present in XSD is present in XML, no error is thrown and it is not present in new XML got after marshalling/unmarshalling. 如果XML中不存在XSD中不存在的任何标记,则不会抛出任何错误,并且在编组/解组后得到的新XML中不存在该错误。
  3. If elements are not in sequence, they are reordered. 如果元素不是顺序的,则重新排序。 This is done by JAXB generated POJOs which we pass while creating JAXBContext. 这是由JAXB生成的POJO完成的,我们在创建JAXBContext时传递这些POJO。
  4. If an element is misplaced inside some other tag, then, it is omitted in new XML. 如果某个元素在其他标记内部放错位置,则在新XML中将其省略。 No error is thrown while marshalling/unmarshalling. 编组/解组时不会抛出任何错误。

public class JAXBSequenceUtil {
  public static void main(String[] args) throws JAXBException, IOException {

    String xml = FileUtils.readFileToString(new File(
            "./conf/out/Response_103_1015700001&^&IOF.xml"));

    System.out.println("Before marshalling : \n" + xml);
    String sequencedXml = correctSequence(xml,
            "org.acord.standards.life._2");
    System.out.println("After marshalling : \n" + sequencedXml);
  }

  /**
   * @param xml
   *            - XML string to be corrected for sequence.
   * @param jaxbPackage
   *            - package containing JAXB generated classes using XSD.
   * @return String - xml with corrected sequence
   * @throws JAXBException
   */
  public static String correctSequence(String xml, String jaxbPackage)
        throws JAXBException {
    JAXBContext jaxbContext = JAXBContext.newInstance(jaxbPackage);
    Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
    Object txLifeType = unmarshaller.unmarshal(new InputSource(
            new StringReader(xml)));
    System.out.println(txLifeType);

    StringWriter stringWriter = new StringWriter();
    Marshaller marshaller = jaxbContext.createMarshaller();
    marshaller.marshal(txLifeType, stringWriter);

    return stringWriter.toString();
  }
}

Your problem translates to this: you have an XSM file that doesn't match the schema and you want to transform it to something that's valid. 您的问题转化为:您有一个与架构不匹配的XSM文件,并且您希望将其转换为有效的内容。

With XSOM, you can read the structure in the XSD and perhaps analyze the XML but it still would need additional mapping from the invalid form to the valid form. 使用XSOM,您可以在XSD中读取结构并可能分析XML,但仍需要从无效表单到有效表单的其他映射。 The use of a stylesheet would be much easier, because you would walk through the XML, using XPath nodes to handle the elements in the proper order. 使用样式表会更容易,因为您将遍历XML,使用XPath节点以正确的顺序处理元素。 With an XML where you want apples before pears, the stylesheet would first copy the apple node (/Fruit/Apple) before it copies the pear node. 使用XML在梨之前需要苹果,样式表将首先复制苹果节点(/ Fruit / Apple),然后复制pear节点。 That way, no matter of the order in the old file, they would be in the correct order in the new file. 这样,无论旧文件中的顺序如何,它们在新文件中的顺序都是正确的。

What you could do with XSOM is to read the XSD and generate the stylesheet that will re-order the data. 您可以使用XSOM执行的操作是读取XSD并生成将重新排序数据的样式表。 Then transform the XML using that stylesheet. 然后使用该样式表转换XML。 once XSOM has generated a stylesheet for the XSD, you can just re-use the stylesheet until the XSD is modified or another XSD is needed. 一旦XSOM为XSD生成了样式表,您就可以重新使用样式表,直到修改XSD或需要其他XSD。

Of course, you could use XSOM to copy nodes immediately in the right order. 当然,您可以使用XSOM以正确的顺序立即复制节点。 But since this means your code has to walk itself through all nodes and child nodes, it might take some time to process to finish. 但由于这意味着您的代码必须遍历所有节点和子节点,因此处理完成可能需要一些时间。 A stylesheet would do the same, but the transformer will be able to process it all faster. 样式表也会这样做,但变换器将能够更快地处理它。 It can work directly on the data while the Java code would have to get/set every node through the XMLDocument properties. 它可以直接处理数据,而Java代码必须通过XMLDocument属性获取/设置每个节点。


So, I would use XSOM to generate a stylesheet for the XSD which would just copy the XML node by node to re-use over and over again. 所以,我会使用XSOM为XSD生成一个样式表,它只是按节点复制XML节点,一遍又一遍地重复使用。 The stylesheet would only need to be rewritten when the XSD changes and it would perform faster than when the Java API needs to walk through the nodes itself. 只有在XSD更改时才需要重写样式表,并且它的执行速度比Java API需要遍历节点本身时要快。 The stylesheet doesn't care about order so it would always end up in the right order. 样式表不关心订单,因此它总是以正确的顺序结束。
To make it more interesting, you could just skip XSOM and try to work with a stylesheet that reads the XSD to generate another stylesheet from it. 为了使它更有趣,您可以跳过XSOM并尝试使用读取XSD的样式表来从中生成另一个样式表。 This generated stylesheet would be copying the XML nodes in the exact order as defined in the stylesheet. 生成的样式表将按照样式表中定义的确切顺序复制XML节点。 Would it be complex? 它会很复杂吗? Actually, the stylesheet would need to generate templates for every element and make sure the child elements in this element are processed in the correct order. 实际上,样式表需要为每个元素生成模板,并确保以正确的顺序处理此元素中的子元素。

When I think about this, I wonder if this has been done before already. 当我想到这一点时,我想知道这是否已经完成。 It would be very generic and would be able to handle almost every XSD/XML. 它非常通用,几乎可以处理所有XSD / XML。

Let's see... Using "//xsd:element/@name" you would get all element names in the schema. 让我们看看......使用“// xsd:element / @ name”,您将获得模式中的所有元素名称。 Every unique name would need to be translated to a template. 每个唯一名称都需要转换为模板。 Within these templates, you would need to process the child nodes of the specific element, which is slightly more complex to get. 在这些模板中,您需要处理特定元素的子节点,这稍微复杂一些。 Elements can have a reference, which you would need to follow. 元素可以有一个参考,您需要遵循。 Otherwise, get all child xsd:element nodes it. 否则,获取所有子xsd:element节点。

Basically you want to take the root element and from there recursively look at the children in the document and the children defined in the schema and make the order match. 基本上你想要取根元素并从那里递归地查看文档中的子节点和模式中定义的子节点并使顺序匹配。

I'll give you a C#-syntax solution, since that's what I code in day and night, it's pretty close to Java. 我会给你一个C#-syntax解决方案,因为这是我日夜编写的代码,它非常接近Java。 Note that I'll have to take guesses about XSOM since I don't know it's API. 请注意,我不得不对XSOM进行猜测,因为我不知道它的API。 I've also made up the XML Dom methods since giving your C# ones propbably wouldn't help :) 我也编写了XML Dom方法,因为给你的C#可能不会有帮助:)

// assume first call is SortChildrenIntoNewDocument( sourceDom.DocumentElement, targetDom.DocumentElement, schema.RootElement ) //假设第一个调用是SortChildrenIntoNewDocument(sourceDom.DocumentElement,targetDom.DocumentElement,schema.RootElement)

public void SortChildrenIntoNewDocument( XmlElement source, XmlElement target, SchemaElement schemaElement )
{
    // whatever method you use to ask the XSOM to tell you the correct contents
    SchemaElement[] orderedChildren = schemaElement.GetChildren();
    for( int i = 0; i < orderedChildren.Length; i++ )
    {
        XmlElement sourceChild = source.SelectChildByName( orderedChildren[ i ].Name );
        XmlElement targetChild = target.AddChild( sourceChild )
        // recursive-call
        SortChildrenIntoNewDocument( sourceChild, targetChild, orderedChildren[ i ] );
    }
}

I wouldn't recommend a recursive method if it's going to be a deep tree, in that case you would have to create some 'tree walker' type objects. 如果它是一个深树,我不建议使用递归方法,在这种情况下,你必须创建一些'tree walker'类型的对象。 The advantage of that approach is you'll be able to handle more complex things like when the schema says you can have 0-or-more of an element you can keep processing source nodes until there's no more that match, then move the schema walker on from there. 这种方法的优点是你将能够处理更复杂的事情,例如当模式表明你可以拥有0或更多元素时,你可以继续处理源节点,直到没有更多的匹配,然后移动模式walker从那里开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM