简体   繁体   中英

Read from Stax XMLStreamReader to unmarshall partial

Im working with the Stax cursor api to extract data from large xml files. Current I go to the beginning of a special tag and unmarshall the tag with JAXB. That works fine on well formed xml files. But not a long time ago I had a document in which one of hundreds of thousands tags was not closed. JAXB steped with the XMLStreamReader until the end of the document and failed. Is there a way to read from a beginning tag to a closing tag and unmarshall this separate? So I would lose two tags with Exception and not the rest of the document. The only way I found was to use a normal BufferedReader instead of the XMLStreamReader and check the line content. But that solution seems to me ugly.

I've had reasonable success using jackson to deserialise XML fragments. When individual reads fail the process can be recovered by advancing the cursor to the next fragment:

import com.fasterxml.jackson.dataformat.xml.XmlMapper;

import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import java.io.StringReader;

public class XmlFragmentReader {
    public static void main(String[] args) throws XMLStreamException {
        String xml =
            "<list>\n" +
            "<object><name>a</name></object>\n" +
            "<object><name>b</name>\n" + // Missing closing tag
            "<object><name>c</name></object>\n" +
            "<object><name>d</name></object>\n" +
            "<object><name>e</name></object>\n" +
            "</list>";

        XMLStreamReader reader = XMLInputFactory
            .newInstance()
            .createXMLStreamReader(new StringReader(xml));

        XmlMapper mapper = new XmlMapper();
        while (next(reader, "object")) {
            try {
                Obj obj = mapper.readValue(reader, Obj.class);
                System.out.println("Read: " + obj.getName());
            } catch (Exception e) {
                System.err.println("Read Failed: " + e);
            }
        }
    }

    // Advance cursor to the opening tag <name>
    private static boolean next(XMLStreamReader reader, String name) throws XMLStreamException {
        while (true) {
            if (reader.getEventType() == XMLStreamConstants.START_ELEMENT && reader.getLocalName().equals(name)) {
                return true;
            } else if (!reader.hasNext()) {
                return false;
            }
            reader.next();
        }
    }

    // Test object
    @XmlRootElement(name = "object")
    public static class Obj {
        private String name;

        @XmlElement
        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }
    }
}

Output:

Read a
Read d
Read e

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM