Standardise on a XML reader methodology

Question

In an open source project I maintain, we have at least three different ways of reading, processing and writing XML files and I would like to standardise on a single method for ease of maintenance and stability.

Currently all of the project files use XML from the configuration to the stored data, we're hoping to migrate to a simple database at some point in the future but will still need to read/write some form of XML files.

The data is stored in an XML format that we then use a XSLT engine (Saxon) to transform into the final HTML files.

We currently utilise these methods: - XMLEventReader/XMLOutputFactory (javax.xml.stream) - DocumentBuilderFactory (javax.xml.parsers) - JAXBContext (javax.xml.bind)

Are there any obvious pros and cons to each of these? Personally, I like the simplicity of DOM (Document Builder), but I'm willing to convert to one of the others if it makes sense in terms of performance or other factors.

Edited to add: There can be a significant number of files read/written when the project runs, between 100 & 10,000 individual files of around 5Kb each

Answer 1

It depends on what you are doing with the data.

If you are simply performing XSLT transforms on XML files to produce HTML files then you may not need to touch a parser directly:

import java.io.File;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        TransformerFactory tf = TransformerFactory.newInstance();    
        StreamSource xsltTransform = new StreamSource(new File("xslt.xml"));
        Transformer transformer = tf.newTransformer(xsltTransform);

        StreamSource source = new StreamSource(new File("source.xml"));

        StreamResult result = new StreamResult(new File("result.html"));
        transformer.transform(source, result);            
    }

}

If you need to make changes to the input document before you transform it, DOM is a convenient mechanism for doing this:

import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;

public class Demo {

    public static void main(String[] args) throws Exception {
        TransformerFactory tf = TransformerFactory.newInstance();
        StreamSource xsltTransform = new StreamSource(new File("xslt.xml"));
        Transformer transformer = tf.newTransformer(xsltTransform);

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document document = db.parse(new File("source.xml"));
        // modify the document
        DOMSource source = new DOMSource(document);

        StreamResult result = new StreamResult(new File("result.html"));
        transformer.transform(source, result);  
    }

}

If you prefer a typed model to make changes to the data then JAXB is a perfect fit:

import java.io.File;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.util.JAXBSource;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        TransformerFactory tf = TransformerFactory.newInstance();
        StreamSource xsltTransform = new StreamSource(new File("xslt.xml"));
        Transformer transformer = tf.newTransformer(xsltTransform);

        JAXBContext jc = JAXBContext.newInstance("com.example.model");
        Unmarshaller unmarshaller = jc.createUnmarshaller();
        Model model = (Model) unmarshaller.unmarshal(new File("source.xml"));
        // modify the domain model
        JAXBSource source = new JAXBSource(jc, model);

        StreamResult result = new StreamResult(new File("result.html"));
        transformer.transform(source, result);            
    }

}

Answer 2

This is a very subjective topic. It primarily depends on how you are going to use the xml and size of XML. If XML is (always) small enough to be loaded in to memory, then you don't have to worry about memory foot print. You can use DOM parser. If you need to a parse through 150 MB xml you may want to think of using SAX. etc.

Standardise on a XML reader methodology

Question

2 answers

solution1
1 ACCPTED 2010-10-26 14:46:22

solution2
0 2010-10-26 10:16:22

Standardise on a XML reader methodology

Question

2 answers

solution1 1 ACCPTED 2010-10-26 14:46:22

solution2 0 2010-10-26 10:16:22

solution1
1 ACCPTED 2010-10-26 14:46:22

solution2
0 2010-10-26 10:16:22