简体   繁体   中英

Processing XML file with Huge data

I am working on an application which has below requirements -

  1. Download a ZIP file from a server.
  2. Uncompress the ZIP file, get the content (which is in XML format) from this file into a String.
  3. Pass this content into another method for parsing and further processing.

Now, my concerns here is the XML file may be of Huge size say like '100MB', and my JVM has memory of only 512 MB, so how can I get this content into Chunks and pass for Parsing and then insert the data into PL/SQL tables.

Since there can be multiple requests running at the same time and considering 512MB of memory what will be the best possible to process this.

How I can get the data into Chunks and pass it as Stream for XML parsing.

Java's XMLReader is aa SAX2 parser. Where a DOM parser reads the whole of the XML file in and creates a (often large) data structure (usually a tree) to represent its contents, a SAX parser lets you register a handler that will be called when pieces of the XML document are recognized. In that call-back code, you can save only enough data to do what you need -- eg you might save all the fields that will end up as a single row in the database, insert that row and then discard the data. With this type of design, your program's memory consumption depends less on the file size than on the complexity and size of a single logical data item (in your case, the data that will become one row in the database).

Even if you did use a DOM-style parser, things might not be quite as bad as you expect. XML is pretty verbose, so (depending on how it's structured and such) a 100 MB file will often represent only 10-20 MB of data, and as little as 5 MB of data wouldn't be particularly rare or unbelievable.

任何 SAX 解析器都应该可以工作,因为它不会像 DOM 解析器那样将整个 XML 文件加载到内存中。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM