简体   繁体   中英

Inconsistentency in deserealizing objects with Jackson streaming API

I am trying to use Jackson streaming API to deserialize huge objects from XML. The idea is to combine streaming API and ObjectMapper to parse XML(or JSON) by small chunks. However I see some inconsistent behavior with XML Parser. With this code snippet:

 try {
  String xml1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo></foo>";
  String xml2 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo><bar></bar></foo>";
  XmlFactory xmlFactory = new XmlFactory();
  JsonParser jp = xmlFactory.createParser(new ByteArrayInputStream(xml1.getBytes()));
  JsonToken token = jp.nextToken();
  while (token != null) {
    System.out.println("xml1 token=" + token);
    token = jp.nextToken();
  }

  jp = xmlFactory.createParser(new ByteArrayInputStream(xml2.getBytes()));
  token = jp.nextToken();
  while (token != null) {
    System.out.println("xml2 token=" + token);
    token = jp.nextToken();
  }

} catch (IOException e) {
  e.printStackTrace();
}

I am getting:

  • xml1 token=START_OBJECT
  • xml1 token=END_OBJECT
  • xml2 token=START_OBJECT
  • xml2 token=FIELD_NAME
  • xml2 token=VALUE_NULL
  • xml2 token=END_OBJECT

Why is the FIELD_NAME token missing for xml1? Why is there just one START_OBJECT token for the second xml? Is there any setting that would allow me to see FIELD_NAME of outer tag?

Problem is quite simple: XML module is different from most other Jackson dataformat modules in that direct access via Streaming API is not supported. This is mentioned on project README (along with mention that "tree model" is similarly not supported). Not supported does not necessarily mean "can not be used at all", just that its behavior is different from handling for JSON so callers really need to know what they are doing above and beyond API used for JSON content (and Smile, CBOR, YAML -- even CSV content is represented in a way that is compatible with JSON access).

While you can try to use XmlFactory and streaming parser/generator, its behavior is controlled by XmlMapper based on metadata from Java classes, to make things works correctly via databinding API (that is, XmlMapper ).

With that, the reason for observed tokens is that such translation is necessary to map to expected Java object structure:

public class Foo { public Bar bar; }

which would map to JSON like:

json { "bar" : null }

as well as XML of

xml <foo> <bar></bar> </foo>

Another way to put this is that XML and JSON data models are fundamentally different, and they can not be trivially translated. Since Jackson's token model is based on JSON, some work is needed to translated XML elements and attributes into structure that equivalent JSON would have.

Above is not to say that what you try to do is impossible. There are 2 ways you might be able to make things work:

  1. Knowing translation that XmlParser does, call getToken() expecting translation
  2. Instead of using XmlParser directly, construct XMLStreamReader (Stax low-level streaming parser), read "raw" tokens, and construct separate XmlParser (via XmlFactory ) at expected location, use that for reading.

I hope this helps.

A kid with a hammer...

I don't know much about Jackson; in fact, I just started using it, thinking of using JSON or YAML instead of XML. But for XML, we have been using XStream with success.

//Consumer side
    FileInputStream fis = new FileInputStream(filename);
    XStream xs = new XStream();
    Object obj = xs.fromXML(fis);
    fis.close();

Also, if the case is that you are also originating the serialization and it is from Java, you could use Java serialization altogether for a lower footprint and faster operation.

//producer side
    FileOutputStream fos = new FileOutputStream(filename);
    ObjectOutputStream oos = new ObjectOutputStream(new BufferedOutputStream(fos));
    oos.writeObject(yourVeryComplexObjectStructure); //I am writing a list of ten 1MB objects
    oos.flush();
    oos.close();
    fos.close();


//Consumer side
    final FileInputStream fin = new FileInputStream(filename);
    final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(fin));
    @SuppressWarnings("unchecked")
    final YourVeryComplexObjectStructureType object = (YourVeryComplexObjectStructureType) ois.readObject();
    ois.close();
    fin.close();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM