简体   繁体   中英

StAX and DOM using XMLEventReader rather than XMLStreamReader

I'd like to write some code essentially analogous to Reading a big XML file using stax and dom but using XMLEventReader rather than XMLStreamReader (I need to be able to check the value of some elements before going ahead and creating the DOM).

Does anyone have a minimal example of how this might look? Everything I've tried so far gives me errors or Null Pointer Exceptions.

Thanks! Arlo

Have a look at: http://www.vogella.com/articles/JavaXML/article.html#javastax_read

It gives a nice, small, example how to use xml-streaming and XMLEventReader

I'm having the same issue and as far as I could debug, everything indicates that there's a bug in the JDK (at least on build 1.8.0_162-b12), more specifically in the class com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX .

The NPE is actually only a consequence of another bug, which is related to how the reader is handled in this class's bridge() method. There if the reader in not in the START_DOCUMENT state, the next event is only peeked but not advanced with nextEvent() on the very first time. This leads to the first START_ELEMENT event to be processed twice. This can be well observed if you use a StreamResult instead of DOMResult . There the NPE does not occur, but the XML produced in the result stream will contain the start of the tag of the first element twice.

I'm trying now to workaround this with an XmlEventWriter that receives the DOMResult . So, basically simulating what the Transformer would do, pushing each read event directly to that writer. If I succeed, I'll post my solution here as well.

PS: I would like to report this issue on the JDK or eventually even push a potential solution to it. If anybody could tell me how this is supposed to be done, I would very much appreciate it.

UPDATE:

So, I managed to workaround this issue with the approach mentioned above. Based on the code suggested in Reading a big XML file using stax and dom , instead of using the Transformer , you could use the following method:

  private Node readToNode(final XMLEventReader reader) throws XMLStreamException, ParserConfigurationException {
    XMLEvent event = reader.peek();
    if (!event.isStartElement()) {
      throw new IllegalArgumentException("reader must be on START_ELEMENT event");
    }
    final Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
    final XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(new DOMResult(document));
    int depth = 0;
    do {
      event = reader.nextEvent();
      writer.add(event);
      if (event.isStartElement()) {
        depth++;
      } else if (event.isEndElement()) {
        depth--;
      }
    } while (reader.hasNext() && !(event.isEndElement() && depth <= 0));
    return document.getDocumentElement();
  }

However, this approach has some limitations ! As visible in the code, we need to create a Document object that wraps the node, otherwise the XML writer will run into issues. If you are intending to manipulate this DOM and send it afterwards to another active XMLEventWriter (as I was trying to do) using the Transformer again, it will fail. This is because the Transformer will send a START_DOCUMENT event to the writer that had already started. I tried the same approach the other way round, ie wrapping the node into a DOMSource , send it to another XmlEventReader and pipe the events to my existing XmlEventWriter , but that also doesn't work as XmlEventReader apparently supports only StreamSource s (see here ).

Summarizing, if you only need the DOM objects, this could work well but if you're trying to transform XML fragments piping the events to a writer (as I do), you could run into issues.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM