简体   繁体   中英

Editing a BIG XML via DOM parser

If there is a very big XML and DOM parser is used to parse it. Now there is a requirement to add/delete elements from the XML ie edit the XML How to edit the XML as the entire XML will not be loaded due to memory constraints ? What could be the strategy to solve this ?

You may consider to use a SAX parser instead, which doesn't keep the whole document in memory. It will be faster and will also use much less memory.

As two other answers mentioned already, a SAX parser will do the trick. Your other alternative to DOM is a StAX parser .

Traditionally, XML APIs are either:

  • DOM based - the entire document is read into memory as a tree structure for random access by the calling application
  • event based - the application registers to receive events as entities are encountered within the source document.

Both have advantages; the former (for example, DOM) allows for random access to the document, the latter (eg SAX) requires a small memory footprint and is typically much faster.

These two access metaphors can be thought of as polar opposites. A tree based API allows unlimited, random access and manipulation, while an event based API is a 'one shot' pass through the source document.

StAX was designed as a median between these two opposites. In the StAX metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. This is different from an event based API - such as SAX - which 'pushes' data to the application - requiring the application to maintain state between events as necessary to keep track of location within the document.

StAX is my preferred approach for handling large documents. If DOM is a requirement, check out DOM implementations like Xerces that support lazy construction of DOM nodes:

Your assumption of memory constraint loading the XML document may only apply to DOM. VTD-XML loads the entire XML in memory, and does it efficiently (1.3x the size of XML document)... both in memory and performance...

http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf

Another distinct benefit, which none other XML framework in existence has, is its incremental update capability...

http://www.devx.com/xml/Article/36379

As stivlo mentioned you can use a SAX parser for reading the XML.

But for writing the XML you can write into fileoutput stream as plain text. I am sure that you will get requirement that mentions after which tag or under which tag the new data should be inserted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM