简体   繁体   中英

Using a schema to reorder the elements of an XML document in conformance with the schema

Say I have an XML document (represented as text, a W3C DOM, whatever), and also an XML Schema. The XML document has all the right elements as defined by the schema, but in the wrong order.

How do I use the schema to "re-order" the elements in the document to conform to the ordering defined by the schema?

I know that this should be possible, probably using XSOM , since the JAXB XJC code generator annotates its generated classes with the correct serialization order of the elements.

However, I'm not familiar with the XSOM API, and it's pretty dense, so I'm hoping one of you lot has some experience with it, and can point me in the right direction. Something like "what child elements are permitted inside this parent element, and in what order?"


Let me give an example.

I have an XML document like this:

<A>
   <Y/>
   <X/>
</A>

I have an XML Schema which says that the contents of <A> must be an <X> followed by a <Y> . Now clearly, if I try to validate the document against the schema, it fails, since the <X> and <Y> are in the wrong order. But I know my document is "wrong" in advance, so I'm not using the schema to validate just yet. However, I do know that my document has all of the correct elements as defined by the schema, just in the wrong order.

What I want to do is to programmatically examine the Schema (probably using XSOM - which is an object model for XML Schema), and ask it what the contents of <A> should be. The API will expose the information that "you need an <X> followed by a <Y> ".

So I take my XML document (using a DOM API) and re-arrange and accordingly, so that now the document will validate against the schema.

It's important to understand what XSOM is here - it's a java API which represents the information contained in an XML Schema, not the information contained in my instance document.

What I don't want to do is generate code from the schema, since the schema is unknown at build time. Furthermore, XSLT is no use, since the correct ordering of the elements is determined solely by the data dictionary contained in the schema.

Hopefully that's now explicit enough.

I don't have a good answer to this yet, but I have to note that there is potential for ambiguity there. Consider this schema:

<xs:element name="root">
  <xs:choice>
    <xs:sequence>
      <xs:element name="foo"/>
      <xs:element name="bar">
        <xs:element name="dee">
        <xs:element name="dum">
      </xs:element>
    </xs:sequence>
    <xs:sequence>
      <xs:element name="bar">
        <xs:element name="dum">
        <xs:element name="dee">
      </xs:element>
      <xs:element name="foo"/>
    </xs:sequence>
  </xs:choice>
</xs:element>

and this input XML:

<root>
  <foo/>
  <bar>
    <dum/>
    <dee/>
  </bar>
</root>

This could be made to comply with the schema either by reordering <foo> and <bar> , or by reordering <dee> and <dum> . There doesn't seem to be any reason to prefer one over another.

I was stuck with the same problem for around two weeks. Finally I got the breakthrough. This can be achieved using JAXB marshalling/unmarshalling feature.

In JAXB marshal/unmarshal, XML validation is an optional feature. So while creating Marshaller and UnMarshaller objects, we do not call setSchema(schema) method. Omitting this step avoids XML validation feature of marshal/unmarshal.

So now,

  1. If any mandatory element as per XSD is not present in XML, it is overlooked.
  2. If any tag not present in XSD is present in XML, no error is thrown and it is not present in new XML got after marshalling/unmarshalling.
  3. If elements are not in sequence, they are reordered. This is done by JAXB generated POJOs which we pass while creating JAXBContext.
  4. If an element is misplaced inside some other tag, then, it is omitted in new XML. No error is thrown while marshalling/unmarshalling.

public class JAXBSequenceUtil {
  public static void main(String[] args) throws JAXBException, IOException {

    String xml = FileUtils.readFileToString(new File(
            "./conf/out/Response_103_1015700001&^&IOF.xml"));

    System.out.println("Before marshalling : \n" + xml);
    String sequencedXml = correctSequence(xml,
            "org.acord.standards.life._2");
    System.out.println("After marshalling : \n" + sequencedXml);
  }

  /**
   * @param xml
   *            - XML string to be corrected for sequence.
   * @param jaxbPackage
   *            - package containing JAXB generated classes using XSD.
   * @return String - xml with corrected sequence
   * @throws JAXBException
   */
  public static String correctSequence(String xml, String jaxbPackage)
        throws JAXBException {
    JAXBContext jaxbContext = JAXBContext.newInstance(jaxbPackage);
    Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
    Object txLifeType = unmarshaller.unmarshal(new InputSource(
            new StringReader(xml)));
    System.out.println(txLifeType);

    StringWriter stringWriter = new StringWriter();
    Marshaller marshaller = jaxbContext.createMarshaller();
    marshaller.marshal(txLifeType, stringWriter);

    return stringWriter.toString();
  }
}

Your problem translates to this: you have an XSM file that doesn't match the schema and you want to transform it to something that's valid.

With XSOM, you can read the structure in the XSD and perhaps analyze the XML but it still would need additional mapping from the invalid form to the valid form. The use of a stylesheet would be much easier, because you would walk through the XML, using XPath nodes to handle the elements in the proper order. With an XML where you want apples before pears, the stylesheet would first copy the apple node (/Fruit/Apple) before it copies the pear node. That way, no matter of the order in the old file, they would be in the correct order in the new file.

What you could do with XSOM is to read the XSD and generate the stylesheet that will re-order the data. Then transform the XML using that stylesheet. once XSOM has generated a stylesheet for the XSD, you can just re-use the stylesheet until the XSD is modified or another XSD is needed.

Of course, you could use XSOM to copy nodes immediately in the right order. But since this means your code has to walk itself through all nodes and child nodes, it might take some time to process to finish. A stylesheet would do the same, but the transformer will be able to process it all faster. It can work directly on the data while the Java code would have to get/set every node through the XMLDocument properties.


So, I would use XSOM to generate a stylesheet for the XSD which would just copy the XML node by node to re-use over and over again. The stylesheet would only need to be rewritten when the XSD changes and it would perform faster than when the Java API needs to walk through the nodes itself. The stylesheet doesn't care about order so it would always end up in the right order.
To make it more interesting, you could just skip XSOM and try to work with a stylesheet that reads the XSD to generate another stylesheet from it. This generated stylesheet would be copying the XML nodes in the exact order as defined in the stylesheet. Would it be complex? Actually, the stylesheet would need to generate templates for every element and make sure the child elements in this element are processed in the correct order.

When I think about this, I wonder if this has been done before already. It would be very generic and would be able to handle almost every XSD/XML.

Let's see... Using "//xsd:element/@name" you would get all element names in the schema. Every unique name would need to be translated to a template. Within these templates, you would need to process the child nodes of the specific element, which is slightly more complex to get. Elements can have a reference, which you would need to follow. Otherwise, get all child xsd:element nodes it.

Basically you want to take the root element and from there recursively look at the children in the document and the children defined in the schema and make the order match.

I'll give you a C#-syntax solution, since that's what I code in day and night, it's pretty close to Java. Note that I'll have to take guesses about XSOM since I don't know it's API. I've also made up the XML Dom methods since giving your C# ones propbably wouldn't help :)

// assume first call is SortChildrenIntoNewDocument( sourceDom.DocumentElement, targetDom.DocumentElement, schema.RootElement )

public void SortChildrenIntoNewDocument( XmlElement source, XmlElement target, SchemaElement schemaElement )
{
    // whatever method you use to ask the XSOM to tell you the correct contents
    SchemaElement[] orderedChildren = schemaElement.GetChildren();
    for( int i = 0; i < orderedChildren.Length; i++ )
    {
        XmlElement sourceChild = source.SelectChildByName( orderedChildren[ i ].Name );
        XmlElement targetChild = target.AddChild( sourceChild )
        // recursive-call
        SortChildrenIntoNewDocument( sourceChild, targetChild, orderedChildren[ i ] );
    }
}

I wouldn't recommend a recursive method if it's going to be a deep tree, in that case you would have to create some 'tree walker' type objects. The advantage of that approach is you'll be able to handle more complex things like when the schema says you can have 0-or-more of an element you can keep processing source nodes until there's no more that match, then move the schema walker on from there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM