简体   繁体   中英

Parsing huge XML file to form a DOM tree

I have a huge XML file (around 904Mb) and my aim is to form it into a DOM tree, using following code:

    org.w3c.dom.Node html  = null;

     DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        org.w3c.dom.Document doc = builder.parse(new File("xml_file");
        html=doc.getFirstChild();

           DocumentTraversal traversal = (DocumentTraversal) doc;

        NodeIterator iterator = traversal.createNodeIterator(
          doc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);

        for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
            System.out.println("Element: " + ((Element) n).getTagName());
            String tagname = ((Element) n).getTagName();

            NamedNodeMap map = ((Element)n).getAttributes();
            if(map.getLength() > 0) {

                    for(int i=0; i<map.getLength(); i++) {
                        Node node = map.item(i);
            System.out.println(node.getNodeName());

                                     }
                                                          }

However, because the XML file is huge, it takes like forever to create the DOM tree. Is there any particular trick for doing so?

Use the StAX Library ; StAX is an event based pull API to handle XML. StAX takes an inputStream as an argument so not the whole XML DOM will be loaded into memory and it has a small memory footprint.

This page lists the reasons for using StAX and a comparison with other methods.

In case you haven't read this paper ( http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf ) , it compares a number comprehensive list of XML processing libraries today. And the best option will undoubtedly be clear to you...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM