简体   繁体   English

如何通过 StAX 修改巨大的 XML 文件?

[英]How to modify a huge XML file by StAX?

I have a huge XML (~2GB) and I need to add new Elements and modify the old ones.我有一个巨大的 XML (~2GB),我需要添加新元素并修改旧元素。 For example, I have:例如,我有:

<books>
    <book>....</book>
    ...
    <book>....</book>
</books>

And want to get:并想得到:

<books>
   <book>
      <index></index>
      ....
   </book>
   ...
   <book>
      <index></index>
      ....
   </book>
</books>

I used the following code:我使用了以下代码:

XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream(file));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(new FileWriter(file, true));
while (eventReader.hasNext()) {
   XMLEvent event = eventReader.nextEvent();
   if (event.getEventType() == XMLEvent.START_ELEMENT) {
      if (event.asStartElement().getName().toString().equalsIgnoreCase("book")) {
          writer.writeStartElement("index");
          writer.writeEndElement();
       }
    }
}
writer.close();

But the result was the following:但结果如下:

<books>
   <book>....</book>
   ....
   <book>....</book>
</books><index></index>

Any ideas?有任何想法吗?

Try this试试这个

    XMLInputFactory inFactory = XMLInputFactory.newInstance();
    XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream("1.xml"));
    XMLOutputFactory factory = XMLOutputFactory.newInstance();
    XMLEventWriter writer = factory.createXMLEventWriter(new FileWriter(file));
    XMLEventFactory eventFactory = XMLEventFactory.newInstance();
    while (eventReader.hasNext()) {
        XMLEvent event = eventReader.nextEvent();
        writer.add(event);
        if (event.getEventType() == XMLEvent.START_ELEMENT) {
            if (event.asStartElement().getName().toString().equalsIgnoreCase("book")) {
                writer.add(eventFactory.createStartElement("", null, "index"));
                writer.add(eventFactory.createEndElement("", null, "index"));
            }
        }
    }
    writer.close();

Notes笔记

new FileWriter(file, true) is appending to the end of the file, you hardly really need it new FileWriter(file, true) 附加到文件的末尾,你几乎不需要它

equalsIgnoreCase("book") is bad idea because XML is case-sensitive equalsIgnoreCase("book") 是个坏主意,因为 XML 区分大小写

Well it is pretty clear why it behaves the way it does.那么很清楚为什么它的行为方式如此。 What you are actually doing is opening the existing file in output append mode and writing elements at the end.您实际上正在做的是在输出追加模式下打开现有文件并在最后写入元素。 That clearly contradicts what you are trying to do.这显然与您正在尝试做的事情相矛盾。

(Aside: I'm surprised that it works as well as it does given that the input side is likely to see the elements that the output side is added to the end of the file. And indeed the exceptions like Evgeniy Dorofeev's example gives are the sort of thing I'd expect. The problem is that if you attempt to read and write a text file at the same time, and either the reader or writer uses any form of buffering, explicit or implicit, the reader is liable to see partial states.) (旁白:我很惊讶它的工作原理和它一样好,因为输入端可能会看到输出端添加到文件末尾的元素。确实像 Evgeniy Dorofeev 的例子给出的例外是我期望的那种事情。问题是,如果您尝试同时读取和写入文本文件,并且读者或作者使用任何形式的缓冲,显式或隐式,读者很可能会看到部分状态。)

To fix this you have to start by reading from one file and writing to a different file.要解决此问题,您必须从读取一个文件并写入另一个文件开始。 Appending won't work.附加将不起作用。 Then you have to arrange that the elements, attributes, content etc that are read from the input file are copied to the output file.然后,您必须安排将从输入文件中读取的元素、属性、内容等复制到输出文件中。 Finally, you need to add the extra elements at the appropriate points.最后,您需要在适当的点添加额外的元素。


And is there any possibility to open the XML file in mode like RandomAccessFile, but write in it by StAX methods?是否有可能以 RandomAccessFile 之类的模式打开 XML 文件,但通过 StAX 方法将其写入?

No. That is theoretically impossible.不,这在理论上是不可能的。 In order to to be able to navigate around an XML file's structure in a "random" file, you'd first need to parse the whole thing and build an index of where all the elements are.为了能够在“随机”文件中浏览 XML 文件的结构,您首先需要解析整个内容并构建所有元素所在位置的索引。 Even when you've done that, the XML is still stored as characters in a file, and random access does not allow you to insert and remove characters in the middle of a file.即使您这样做了,XML 仍然作为字符存储在文件中,随机访问不允许您在文件中间插入和删除字符。

Maybe your best bet would be combining XSL and a SAX style parser;也许最好的办法是结合 XSL 和 SAX 风格的解析器; eg something along the lines of this IBM article: http://ibm.com/developerworks/xml/library/x-tiptrax例如,沿着这篇 IBM 文章的内容: http : //ibm.com/developerworks/xml/library/x-tiptrax

Maybe this StAX Read-and-Write Example in JavaEE tutorial helps: http://docs.oracle.com/javaee/5/tutorial/doc/bnbfl.html#bnbgq也许 JavaEE 教程中的这个 StAX 读写示例有帮助: http : //docs.oracle.com/javaee/5/tutorial/doc/bnbfl.html#bnbgq

You can download the tutorial examples here: https://java.net/projects/javaeetutorial/downloads您可以在此处下载教程示例: https : //java.net/projects/javaeetutorial/downloads

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM