简体   繁体   English

将巨大的XML文件从DOM写入文件

[英]Write huge XML file from DOM to file

I have a java program which queries a table which has millions of records and generates a xml with each record as node. 我有一个java程序,它查询具有数百万条记录的表,并生成一个xml,每条记录作为节点。

The challenge is that the program is running out of heap memory. 挑战在于程序的堆内存不足。 I have allocated 2GB heap for the program. 我为该程序分配了2GB堆。

I am looking for alternate approaches of creating such huge xml. 我正在寻找创建如此巨大的xml的替代方法。

Can we write out partial DOM object to file and release the memory? 我们可以写出部分DOM对象到文件并释放内存吗?
For eg, create 100 nodes in DOM object, write to file, release the memory, then create next 100 nodes in DOM etc 例如,在DOM对象中创建100个节点,写入文件,释放内存,然后在DOM中创建接下来的100个节点等

Code to write a node to file 将节点写入文件的代码

DOMSource source = new DOMSource(node);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);

But how do I release the DOM memory after writing the nodes to file? 但是如何在将节点写入文件后释放DOM内存?

Why do you need to generate a DOM? 为什么需要生成DOM? Try to write the XML directly. 尝试直接编写XML。 The most convenient API for outputting XML from Java is the StAX XMLStreamWriter interface. 从Java输出XML最方便的API是StAX XMLStreamWriter接口。 There are a number of implementations of XMLStreamWriter that generate lexical (serialized) XML, including the Saxon serializer which gives you considerable control over the way in which it is serialized (eg indentation and encoding) if you need it. 有许多XMLStreamWriter实现可以生成词法(序列化)XML,包括Saxon序列化程序,如果需要,它可以让您对序列化的方式进行相当大的控制(例如缩进和编码)。

I would use a simple OutputStreamWriter and format the xml by myself, you don't need to create a huge dom structure. 我会使用一个简单的OutputStreamWriter并自己格式化xml,你不需要创建一个巨大的dom结构。 I think this is the fastest way. 我认为这是最快的方式。

Of course depends on how much xml structure you want to accomplish. 当然取决于你想要完成多少xml结构。 If one table row corresponds to one xml line, this should be the fastest way to do it. 如果一个表行对应一个xml行,这应该是最快的方法。

For processing a huge document, SAX is often preferred precisely because it keeps in memory only what you have explicitly decided to keep in memory -- which means you can use a specialized, and hence smaller, data model. 为了处理一个巨大的文档,SAX通常是首选,因为它只在内存中保留您明确决定保留在内存中的内容 - 这意味着您可以使用专门的,因此更小的数据模型。 For tasks such as this one, where you have no need to crossreference different parts of the document, you may not need any data model at all and can just generate SAX events directly from the input data and feed those into the serializer. 对于这样的任务,您无需交叉引用文档的不同部分,您可能根本不需要任何数据模型,只需直接从输入数据生成SAX事件并将其提供给序列化程序。

(StAX is pretty much equivalent in this regard. I usually prefer to stay with SAX since it's part of the JAXP API package and should be present in just about every Java environment at this point, but StAX may be a bit easier to work with.) (StAX在这方面非常相同。我通常更喜欢使用SAX,因为它是JAXP API包的一部分,此时应该出现在几乎所有Java环境中,但StAX可能更容易使用。 )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM