简体   繁体   English

在 java 中解析和修改后,如何保留 XML 中的属性顺序?

[英]How to preserve the Attributes order in a XML after parsing and modifications in java?

First of all, some premise.首先,有一个前提。

I am aware of the existence of several identical questions on the site but in none of these I have found a definitive solution to the problem.我知道网站上存在几个相同的问题,但在这些问题中我都没有找到解决问题的明确方法。 I know that the order of the attributes of xml files is absolutely irrelevant for the purposes of data consistency or the ability to integrate with software that actually treat xml as such and not as strings.我知道 xml 文件的属性顺序对于数据一致性或与实际将 xml 视为此类而不是字符串的软件集成的能力是绝对无关的。 However, I have to keep it because I am going to modify files that will be visually checked by the operators with WinMerge or with Tortoise's check for modifications command.但是,我必须保留它,因为我要修改文件,这些文件将由操作员使用 WinMerge 或 Tortoise 的检查修改命令进行目视检查。 I have used libraries like DOM, STAX and JDOM with poor results.我使用过 DOM、STAX 和 JDOM 等库,但效果不佳。 In the files where I only have to modify the text of an element, I have no problem and if there is some different formatting I can easily modify it considering it as a string.在我只需要修改元素文本的文件中,我没有问题,如果有一些不同的格式,我可以轻松地将其修改为字符串。

With attributes it is more complicated.有了属性就更复杂了。 These are sorted in an other order(please do not question whether this is correct or not is not inherent to the question) and on winmerge looks like if all the document is was modified.这些按其他顺序排序(请不要质疑这是否正确不是问题所固有的),并且在 winmerge 上看起来是否所有文档都已修改。

代码故意不可读

here is a (cutted and with semirandom textcontent) example of my xml first and after the modification这是我的 xml 首先和修改后的(剪切和半随机文本内容)示例

    <?xml version="1.0" encoding="UTF-8"?>
    <sca:composite xmi:version="2.0" 
      xmlns:xmi="http://www.omg.org/XMI" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:BW="http://xsd.tns.tibco.com/amf/models/sca/implementationtype/BW" xmlns:XMLSchema="http://www.w3.org/2001/XMLSchema" 
      xmlns:compositeext="http://schemas.tibco.com/amx/3.0/compositeext" 
      xmlns:productAvailabilityResp="http://www.example.org/ERTETERET" 
      xmlns:property="http://ns.tibco.com/bw/property" 
      xmlns:rest="http://xsd.tns.tibco.com/bERTERTETE" 
      xmlns:sca="http://www.3453434FDSSDFSD.org/xmlns/sca/1.0" 
      xmlns:scact="http://xsd.tns.tibco.com/23E23E2E23Ee" 
      xmlns:scaext="http://2D2333DD32s" 
      xmi:id="_uKDz4IaiEeipW88nT3HxEA" 
      targetNamespace="http://tns.tibco.com/D23D32DD2232D2D2" 
      name="Q1231W1y" compositeext:version="1.0.0" 
      compositeext:description="TO EDIT VALUE" 
      ompositeext:formatVersion="2">
    </sca:composite>

and

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<sca:composite xmlns:sca="http://www.SDFSDF.org/xmlns/sca/1.0" 
    xmlns:BW="http://xsd.tns.tibco.com/amf/models/sca/SDFS/BW" 
    xmlns:XMLSchema="http://www.w3.org/2001/XMLSchema" 
    xmlns:compositeext="http://schemas.tibco.com/amx/3.0/compositeext" 
    xmlns:productAvailabilityResp="http://www.example.org/SDFSDFSD"
     xmlns:property="http://ns.tibco.com/bw/property" 
     xmlns:rest="http://xsd.tns.tibco.com/SDFSF" 
     xmlns:scact="http://xsd.tns.tibco.com/amf/models/sca/SDFSD" 
     xmlns:scaext="http://xsd.tns.tibco.com/amf/models/sca/extensions" 
     xmlns:xmi="http://www.omg.org/XMI" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     compositeext:description="test EDITED VALUE" 
     compositeext:formatVersion="2" 
     compositeext:version="1.0.0" 
     name="ERFERFRFE" 
     targetNamespace="http://tns.tibco.com/bw/composite/ERFERFREy"
     xmi:id="_uKDz4IaiEeipW88nT3HxEA" 
     xmi:version="2.0">
</sca:composite>

Could we try together to find a solution?我们可以一起尝试找到解决方案吗?

Edit like suggested from Federico:像 Federico 建议的那样编辑:

What I need to do is to change the value from a single Attribute and the textcontent from an element, I can do do both of those things.我需要做的是改变单个属性的值和元素的文本内容,我可以做这两件事。 But when I write back the file I find a different order of the attributes and a different formatting:但是当我写回文件时,我发现属性的不同顺序和不同的格式:

<?xml version="1.0" encoding="UTF-8"?>
<sca:composite //same attributes
  compositeext:description="TO EDIT VALUE" 
  //same other attributes>

other stuff 

</sca:composite>

ps: my intent is making a versioner for tibco BW6 projects outside the designer ps:我的意图是在设计师之外为 tibco BW6 项目制作版本器

From my understanding, your program reads the XML input stream from a file with STaX, DOM or SAX, then you do some modifications to elements or attributes, and finally your program will write the data to another XML file.据我了解,您的程序从带有 STaX、DOM 或 SAX 的文件中读取 XML 输入 stream,然后您对元素或属性进行一些修改,最后您的程序会将数据写入另一个 Z3501BB093D36383F8B67 文件。

A requirement is that the detailed structured of the output file resembles that of the input file as close as possible, after the changes made.要求是 output 文件的详细结构在进行更改后与输入文件的结构尽可能接近。 That means – among other conditions – that elements and attributes have to be in the same order in the output document as they were in the input document.这意味着 - 除其他条件外 - 元素和属性在 output 文档中的顺序必须与它们在输入文档中的顺序相同。

XML demands that the sequence of elements remains as is, but (as you said already), the attributes can be in any order without any influence on the semantics of the XML document. XML 要求元素的序列保持原样,但是(正如您已经说过的),属性可以按任何顺序排列,而不会对 XML 文档的语义产生任何影响。

Your problem is, that neither DOM or SAX nor STaX allow you to influence the sequence of the attributes for the elements.您的问题是,DOM、SAX 和 STaX 都不允许您影响元素属性的顺序。

Does this description match with your problem?这个描述和你的问题相符吗?


I am using a large XML file as "poor man's database";我正在使用一个大型 XML 文件作为“穷人的数据库”; that means that I manipulate that XML file with a text editor and that I have a bunch of little programs that create reports from that XML file.这意味着我使用文本编辑器操作 XML 文件,并且我有一堆小程序可以从 XML 文件创建报告。 One of these will sort the "records" in the XML file, and this requires to read it, manipulate the data and to write it afterwards.其中之一将对 XML 文件中的“记录”进行排序,这需要读取、操作数据并在之后写入。

I had the same (at least similar issue) as you: some attributes are at arbitrary locations afterwards.我和你有同样的(至少是类似的问题):一些属性之后在任意位置。 When searching through the text file in the editor, this causes a lot of friction.在编辑器中搜索文本文件时,这会引起很多摩擦。

So instead of using SAX, DOM or STaX for the output, I wrote my own library, that defined a comparator for each element type that is used to sort the attributes of that element type.因此,我没有为 output 使用 SAX、DOM 或 STaX,而是编写了自己的库,该库为每个元素类型定义了一个比较器,用于对该元素类型的属性进行排序。

Some implementations of the comparator used a list with attribute names that defined the order, and that allowed me to have the attributes ordered like this:比较器的一些实现使用了一个带有定义顺序的属性名称的列表,这使我可以像这样对属性进行排序:

<element sortkey="…" id="…" subject="…" date="…" parent="…" …

If you treat the xmi:… things and the namespace definitions all as attributes, the code for such an "XMLWriter" is quite straightforward.如果您将xmi:…事物和命名空间定义都视为属性,则此类“XMLWriter”的代码非常简单。

If the order of the attributes may differ for each individual element (even those with the same name), you have to modify that approach in a way that you have to store the attribute sequence with each element instance on reading.如果每个元素的属性顺序可能不同(即使是具有相同名称的元素),您必须修改该方法,即您必须在读取时存储每个元素实例的属性序列。


But perhaps XML processing is not the right approach for you at all …但也许 XML 处理根本不适合您……

Maybe an approach like that of using sed or awk fits better to your needs.也许像使用sedawk这样的方法更适合您的需求。

This means basically that you search for a certain sequence in the text file (using a regular expression or by line and column number or a combination of both), replace what you find there and start over for the next change on another location.这基本上意味着您在文本文件中搜索某个序列(使用正则表达式或通过行号和列号或两者的组合),替换您在那里找到的内容并重新开始在另一个位置进行下一次更改。


Edit: I did not mean to integrate either sed or awk into the solution;编辑:我并不是要将sedawk集成到解决方案中; what I meant was to adopt only the basic approach of how these tools work, and to implement that in the program.我的意思是只采用这些工具如何工作的基本方法,并在程序中实现它。 Both tools are really powerful, but from what I understand, only a fraction of their features is needed, so that a full integration of one or the other into the program might be overkill – nevertheless, it is possible: A starting point for an integration of awk is awk.sourceforge.net .这两种工具都非常强大,但据我了解,只需要它们的一小部分功能,因此将其中一种或另一种完全集成到程序中可能是多余的——尽管如此,它是可能的: 集成的起点awkawk.sourceforge.net It can be integrated even through JSR-223 (Scripting).它甚至可以通过 JSR-223(脚本)进行集成。 For an integration of sed , a look to the tools4j/unix4j project on github could be helpful.对于sed的集成,查看 github 上的tools4j/unix4j 项目可能会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM