简体   繁体   中英

How to preserve the Attributes order in a XML after parsing and modifications in java?

First of all, some premise.

I am aware of the existence of several identical questions on the site but in none of these I have found a definitive solution to the problem. I know that the order of the attributes of xml files is absolutely irrelevant for the purposes of data consistency or the ability to integrate with software that actually treat xml as such and not as strings. However, I have to keep it because I am going to modify files that will be visually checked by the operators with WinMerge or with Tortoise's check for modifications command. I have used libraries like DOM, STAX and JDOM with poor results. In the files where I only have to modify the text of an element, I have no problem and if there is some different formatting I can easily modify it considering it as a string.

With attributes it is more complicated. These are sorted in an other order(please do not question whether this is correct or not is not inherent to the question) and on winmerge looks like if all the document is was modified.

代码故意不可读

here is a (cutted and with semirandom textcontent) example of my xml first and after the modification

    <?xml version="1.0" encoding="UTF-8"?>
    <sca:composite xmi:version="2.0" 
      xmlns:xmi="http://www.omg.org/XMI" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:BW="http://xsd.tns.tibco.com/amf/models/sca/implementationtype/BW" xmlns:XMLSchema="http://www.w3.org/2001/XMLSchema" 
      xmlns:compositeext="http://schemas.tibco.com/amx/3.0/compositeext" 
      xmlns:productAvailabilityResp="http://www.example.org/ERTETERET" 
      xmlns:property="http://ns.tibco.com/bw/property" 
      xmlns:rest="http://xsd.tns.tibco.com/bERTERTETE" 
      xmlns:sca="http://www.3453434FDSSDFSD.org/xmlns/sca/1.0" 
      xmlns:scact="http://xsd.tns.tibco.com/23E23E2E23Ee" 
      xmlns:scaext="http://2D2333DD32s" 
      xmi:id="_uKDz4IaiEeipW88nT3HxEA" 
      targetNamespace="http://tns.tibco.com/D23D32DD2232D2D2" 
      name="Q1231W1y" compositeext:version="1.0.0" 
      compositeext:description="TO EDIT VALUE" 
      ompositeext:formatVersion="2">
    </sca:composite>

and

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<sca:composite xmlns:sca="http://www.SDFSDF.org/xmlns/sca/1.0" 
    xmlns:BW="http://xsd.tns.tibco.com/amf/models/sca/SDFS/BW" 
    xmlns:XMLSchema="http://www.w3.org/2001/XMLSchema" 
    xmlns:compositeext="http://schemas.tibco.com/amx/3.0/compositeext" 
    xmlns:productAvailabilityResp="http://www.example.org/SDFSDFSD"
     xmlns:property="http://ns.tibco.com/bw/property" 
     xmlns:rest="http://xsd.tns.tibco.com/SDFSF" 
     xmlns:scact="http://xsd.tns.tibco.com/amf/models/sca/SDFSD" 
     xmlns:scaext="http://xsd.tns.tibco.com/amf/models/sca/extensions" 
     xmlns:xmi="http://www.omg.org/XMI" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     compositeext:description="test EDITED VALUE" 
     compositeext:formatVersion="2" 
     compositeext:version="1.0.0" 
     name="ERFERFRFE" 
     targetNamespace="http://tns.tibco.com/bw/composite/ERFERFREy"
     xmi:id="_uKDz4IaiEeipW88nT3HxEA" 
     xmi:version="2.0">
</sca:composite>

Could we try together to find a solution?

Edit like suggested from Federico:

What I need to do is to change the value from a single Attribute and the textcontent from an element, I can do do both of those things. But when I write back the file I find a different order of the attributes and a different formatting:

<?xml version="1.0" encoding="UTF-8"?>
<sca:composite //same attributes
  compositeext:description="TO EDIT VALUE" 
  //same other attributes>

other stuff 

</sca:composite>

ps: my intent is making a versioner for tibco BW6 projects outside the designer

From my understanding, your program reads the XML input stream from a file with STaX, DOM or SAX, then you do some modifications to elements or attributes, and finally your program will write the data to another XML file.

A requirement is that the detailed structured of the output file resembles that of the input file as close as possible, after the changes made. That means – among other conditions – that elements and attributes have to be in the same order in the output document as they were in the input document.

XML demands that the sequence of elements remains as is, but (as you said already), the attributes can be in any order without any influence on the semantics of the XML document.

Your problem is, that neither DOM or SAX nor STaX allow you to influence the sequence of the attributes for the elements.

Does this description match with your problem?


I am using a large XML file as "poor man's database"; that means that I manipulate that XML file with a text editor and that I have a bunch of little programs that create reports from that XML file. One of these will sort the "records" in the XML file, and this requires to read it, manipulate the data and to write it afterwards.

I had the same (at least similar issue) as you: some attributes are at arbitrary locations afterwards. When searching through the text file in the editor, this causes a lot of friction.

So instead of using SAX, DOM or STaX for the output, I wrote my own library, that defined a comparator for each element type that is used to sort the attributes of that element type.

Some implementations of the comparator used a list with attribute names that defined the order, and that allowed me to have the attributes ordered like this:

<element sortkey="…" id="…" subject="…" date="…" parent="…" …

If you treat the xmi:… things and the namespace definitions all as attributes, the code for such an "XMLWriter" is quite straightforward.

If the order of the attributes may differ for each individual element (even those with the same name), you have to modify that approach in a way that you have to store the attribute sequence with each element instance on reading.


But perhaps XML processing is not the right approach for you at all …

Maybe an approach like that of using sed or awk fits better to your needs.

This means basically that you search for a certain sequence in the text file (using a regular expression or by line and column number or a combination of both), replace what you find there and start over for the next change on another location.


Edit: I did not mean to integrate either sed or awk into the solution; what I meant was to adopt only the basic approach of how these tools work, and to implement that in the program. Both tools are really powerful, but from what I understand, only a fraction of their features is needed, so that a full integration of one or the other into the program might be overkill – nevertheless, it is possible: A starting point for an integration of awk is awk.sourceforge.net . It can be integrated even through JSR-223 (Scripting). For an integration of sed , a look to the tools4j/unix4j project on github could be helpful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM