Say I have an XML document (represented as text, a W3C DOM, whatever), and also an XML Schema. The XML document has all the right elements as defined by the schema, but in the wrong order.
How do I use the schema to "re-order" the elements in the document to conform to the ordering defined by the schema?
I know that this should be possible, probably using XSOM , since the JAXB XJC code generator annotates its generated classes with the correct serialization order of the elements.
However, I'm not familiar with the XSOM API, and it's pretty dense, so I'm hoping one of you lot has some experience with it, and can point me in the right direction. Something like "what child elements are permitted inside this parent element, and in what order?"
Let me give an example.
I have an XML document like this:
<A>
<Y/>
<X/>
</A>
I have an XML Schema which says that the contents of <A>
must be an <X>
followed by a <Y>
. Now clearly, if I try to validate the document against the schema, it fails, since the <X>
and <Y>
are in the wrong order. But I know my document is "wrong" in advance, so I'm not using the schema to validate just yet. However, I do know that my document has all of the correct elements as defined by the schema, just in the wrong order.
What I want to do is to programmatically examine the Schema (probably using XSOM - which is an object model for XML Schema), and ask it what the contents of <A>
should be. The API will expose the information that "you need an <X>
followed by a <Y>
".
So I take my XML document (using a DOM API) and re-arrange and accordingly, so that now the document will validate against the schema.
It's important to understand what XSOM is here - it's a java API which represents the information contained in an XML Schema, not the information contained in my instance document.
What I don't want to do is generate code from the schema, since the schema is unknown at build time. Furthermore, XSLT is no use, since the correct ordering of the elements is determined solely by the data dictionary contained in the schema.
Hopefully that's now explicit enough.
I don't have a good answer to this yet, but I have to note that there is potential for ambiguity there. Consider this schema:
<xs:element name="root">
<xs:choice>
<xs:sequence>
<xs:element name="foo"/>
<xs:element name="bar">
<xs:element name="dee">
<xs:element name="dum">
</xs:element>
</xs:sequence>
<xs:sequence>
<xs:element name="bar">
<xs:element name="dum">
<xs:element name="dee">
</xs:element>
<xs:element name="foo"/>
</xs:sequence>
</xs:choice>
</xs:element>
and this input XML:
<root>
<foo/>
<bar>
<dum/>
<dee/>
</bar>
</root>
This could be made to comply with the schema either by reordering <foo>
and <bar>
, or by reordering <dee>
and <dum>
. There doesn't seem to be any reason to prefer one over another.
I was stuck with the same problem for around two weeks. Finally I got the breakthrough. This can be achieved using JAXB marshalling/unmarshalling feature.
In JAXB marshal/unmarshal, XML validation is an optional feature. So while creating Marshaller and UnMarshaller objects, we do not call setSchema(schema) method. Omitting this step avoids XML validation feature of marshal/unmarshal.
So now,
public class JAXBSequenceUtil {
public static void main(String[] args) throws JAXBException, IOException {
String xml = FileUtils.readFileToString(new File(
"./conf/out/Response_103_1015700001&^&IOF.xml"));
System.out.println("Before marshalling : \n" + xml);
String sequencedXml = correctSequence(xml,
"org.acord.standards.life._2");
System.out.println("After marshalling : \n" + sequencedXml);
}
/**
* @param xml
* - XML string to be corrected for sequence.
* @param jaxbPackage
* - package containing JAXB generated classes using XSD.
* @return String - xml with corrected sequence
* @throws JAXBException
*/
public static String correctSequence(String xml, String jaxbPackage)
throws JAXBException {
JAXBContext jaxbContext = JAXBContext.newInstance(jaxbPackage);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
Object txLifeType = unmarshaller.unmarshal(new InputSource(
new StringReader(xml)));
System.out.println(txLifeType);
StringWriter stringWriter = new StringWriter();
Marshaller marshaller = jaxbContext.createMarshaller();
marshaller.marshal(txLifeType, stringWriter);
return stringWriter.toString();
}
}
Your problem translates to this: you have an XSM file that doesn't match the schema and you want to transform it to something that's valid.
With XSOM, you can read the structure in the XSD and perhaps analyze the XML but it still would need additional mapping from the invalid form to the valid form. The use of a stylesheet would be much easier, because you would walk through the XML, using XPath nodes to handle the elements in the proper order. With an XML where you want apples before pears, the stylesheet would first copy the apple node (/Fruit/Apple) before it copies the pear node. That way, no matter of the order in the old file, they would be in the correct order in the new file.
What you could do with XSOM is to read the XSD and generate the stylesheet that will re-order the data. Then transform the XML using that stylesheet. once XSOM has generated a stylesheet for the XSD, you can just re-use the stylesheet until the XSD is modified or another XSD is needed.
Of course, you could use XSOM to copy nodes immediately in the right order. But since this means your code has to walk itself through all nodes and child nodes, it might take some time to process to finish. A stylesheet would do the same, but the transformer will be able to process it all faster. It can work directly on the data while the Java code would have to get/set every node through the XMLDocument properties.
When I think about this, I wonder if this has been done before already. It would be very generic and would be able to handle almost every XSD/XML.
Let's see... Using "//xsd:element/@name" you would get all element names in the schema. Every unique name would need to be translated to a template. Within these templates, you would need to process the child nodes of the specific element, which is slightly more complex to get. Elements can have a reference, which you would need to follow. Otherwise, get all child xsd:element nodes it.
Basically you want to take the root element and from there recursively look at the children in the document and the children defined in the schema and make the order match.
I'll give you a C#-syntax solution, since that's what I code in day and night, it's pretty close to Java. Note that I'll have to take guesses about XSOM since I don't know it's API. I've also made up the XML Dom methods since giving your C# ones propbably wouldn't help :)
// assume first call is SortChildrenIntoNewDocument( sourceDom.DocumentElement, targetDom.DocumentElement, schema.RootElement )
public void SortChildrenIntoNewDocument( XmlElement source, XmlElement target, SchemaElement schemaElement )
{
// whatever method you use to ask the XSOM to tell you the correct contents
SchemaElement[] orderedChildren = schemaElement.GetChildren();
for( int i = 0; i < orderedChildren.Length; i++ )
{
XmlElement sourceChild = source.SelectChildByName( orderedChildren[ i ].Name );
XmlElement targetChild = target.AddChild( sourceChild )
// recursive-call
SortChildrenIntoNewDocument( sourceChild, targetChild, orderedChildren[ i ] );
}
}
I wouldn't recommend a recursive method if it's going to be a deep tree, in that case you would have to create some 'tree walker' type objects. The advantage of that approach is you'll be able to handle more complex things like when the schema says you can have 0-or-more of an element you can keep processing source nodes until there's no more that match, then move the schema walker on from there.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.