简体   繁体   中英

Write huge XML file from DOM to file

I have a java program which queries a table which has millions of records and generates a xml with each record as node.

The challenge is that the program is running out of heap memory. I have allocated 2GB heap for the program.

I am looking for alternate approaches of creating such huge xml.

Can we write out partial DOM object to file and release the memory?
For eg, create 100 nodes in DOM object, write to file, release the memory, then create next 100 nodes in DOM etc

Code to write a node to file

DOMSource source = new DOMSource(node);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);

But how do I release the DOM memory after writing the nodes to file?

Why do you need to generate a DOM? Try to write the XML directly. The most convenient API for outputting XML from Java is the StAX XMLStreamWriter interface. There are a number of implementations of XMLStreamWriter that generate lexical (serialized) XML, including the Saxon serializer which gives you considerable control over the way in which it is serialized (eg indentation and encoding) if you need it.

I would use a simple OutputStreamWriter and format the xml by myself, you don't need to create a huge dom structure. I think this is the fastest way.

Of course depends on how much xml structure you want to accomplish. If one table row corresponds to one xml line, this should be the fastest way to do it.

For processing a huge document, SAX is often preferred precisely because it keeps in memory only what you have explicitly decided to keep in memory -- which means you can use a specialized, and hence smaller, data model. For tasks such as this one, where you have no need to crossreference different parts of the document, you may not need any data model at all and can just generate SAX events directly from the input data and feed those into the serializer.

(StAX is pretty much equivalent in this regard. I usually prefer to stay with SAX since it's part of the JAXP API package and should be present in just about every Java environment at this point, but StAX may be a bit easier to work with.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM