Reading a list of XML elements in Java

Question

I would like to iterate over an XML document that is essentially a list of identically structured XML elements. The elements will be serialized into Java objects.

<root>
    <element attribute="value" />
    <element attribute="value" />
    <element attribute="value" />
    ...
</root>

There are a lot of elements within the root element. I would prefer not to load them all into memory. I realize I could use a SAX handler for this, but using a SAX handler to deserialize everything into Java objects seems rather obtuse. I find JDOM very easy to use, but as far as I can tell JDOM always parses the entire tree. Is there a way I can use JDOM to parse the subelements one at a time?

Another reason for using JDOM is it makes writing serialization/deserialization code easy for the corresponding Java objects, which are meaningless if not entirely in memory. However, I don't want to load all of the Java objects into memory at the same time. Rather, I want to iterate over them once.

update: here is an example of how to do this in dom4j: http://docs.codehaus.org/display/GROOVY/Reading+XML+with+Groovy+and+DOM4J . Anyway to do this in jdom?

Answer 1

Why not use StAX (javax.xml.stream.*, an implementation is included in Java SE 6) to stream in the XML, and convert individual portions to objects?

import java.io.FileReader;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Element.class);
        Unmarshaller unmarshaller = jc.createUnmarshaller();

        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag();
        xsr.nextTag();
        while(xsr.hasNext()) {
            Element element = (Element) unmarshaller.unmarshal(xsr);
            System.out.println(element.getAttribute());
            if(xsr.nextTag() != XMLStreamReader.START_ELEMENT) {
                break;
            }
        }
    }

}

In the above example each individual "element" is unmarshalled into a POJO using JAXB (an implementation is included in Java SE 6), but you could process the fragment as you saw fit. JAXB model details below:

import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement
public class Element {

    private String attribute;

    @XmlAttribute
    public String getAttribute() {
        return attribute;
    }

    public void setAttribute(String attribute) {
        this.attribute = attribute;
    }

}

Note:

StAX and JAXB are also compatible with Java SE 5, you just need to download the implementations separately.

Answer 2

You should use VTD-XML , it is mainly used for stream processing. I use it to read product feeds from advertisers.

The great advantage is that it only needs an XPath and it can iterate over the XML in blazing speed and has a very small memory footprint (only keeps a few pointers while iterating over the XML).

I know the site says they perform x5-12 times faster than parsing the DOM, but from my experience for your kind of task (especially if the size is in the 100s of MB) you can easily get x20 speed.

Here is a simple example of how to read your XML using VTD-XML:

VTDGen vg = new VTDGen();
AutoPilot ap = new AutoPilot();
int i;
ap.selectXPath("/root/element");
if (vg.parseFile(FILE_LOCATION,true)){
    VTDNav vn = vg.getNav();
    ap.bind(vn); // apply XPath to the VTDNav instance
    // AutoPilot moves the cursor for you
    while((i=ap.evalXPath())!=-1){
        System.out.println("the attribute index val is "+ 
            i +" the attribute string ==>"+vn.toString(vn.getAttrVal("attribute")));
    }
}

Answer 3

Short answer: No. Jdom is about parsing xml and turning it into a data structure to perform operations on. This means always deserializing the entire xml.

Answer 4

One easy approach that would cut down on your memory requirements would be to use XPath with JDOM to query a subset of your XML and get only those bits that satisfy your query.

Otherwise you could check out this interesting hint from Elliotte Rusty Harold , it indicates that the streaming API you want is there, just not advertised:

JDOM does have a streaming API. It's just sort of hidden and not widely advertised or explained. In XOM, I made this approach a lot more explicit and documented it. If a streaming tree model is what you want, you're probably better off with XOM, but if you have to stick with JDOM then reading the XOM examples will probably give you enough clue about how to use JDOM in streaming mode.

Reading a list of XML elements in Java

Question

4 answers

solution1
3 2011-04-20 20:54:08

solution2
2 2011-04-20 19:18:30

solution3
0 2011-04-20 17:41:31

solution4
0 ACCPTED 2011-04-20 18:33:13

Reading a list of XML elements in Java

Question

4 answers

solution1 3 2011-04-20 20:54:08

solution2 2 2011-04-20 19:18:30

solution3 0 2011-04-20 17:41:31

solution4 0 ACCPTED 2011-04-20 18:33:13

solution1
3 2011-04-20 20:54:08

solution2
2 2011-04-20 19:18:30

solution3
0 2011-04-20 17:41:31

solution4
0 ACCPTED 2011-04-20 18:33:13