简体   繁体   English

使用XMLEventReader而不是XMLStreamReader的StAX和DOM

[英]StAX and DOM using XMLEventReader rather than XMLStreamReader

I'd like to write some code essentially analogous to Reading a big XML file using stax and dom but using XMLEventReader rather than XMLStreamReader (I need to be able to check the value of some elements before going ahead and creating the DOM). 我想编写一些类似于使用stax和dom读取大XML文件的代码但使用XMLEventReader而不是XMLStreamReader(我需要能够检查某些元素的值,然后再创建DOM)。

Does anyone have a minimal example of how this might look? 有没有人有一个最小的例子,这看起来如何? Everything I've tried so far gives me errors or Null Pointer Exceptions. 到目前为止,我尝试过的所有操作都会给我带来错误或Null Pointer Exception。

Thanks! 谢谢! Arlo 阿罗

Have a look at: http://www.vogella.com/articles/JavaXML/article.html#javastax_read 看看: http : //www.vogella.com/articles/JavaXML/article.html#javastax_read

It gives a nice, small, example how to use xml-streaming and XMLEventReader 它给出了一个很好的小示例,说明如何使用xml流和XMLEventReader

I'm having the same issue and as far as I could debug, everything indicates that there's a bug in the JDK (at least on build 1.8.0_162-b12), more specifically in the class com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX . 我遇到了同样的问题,据我所知,一切都表明JDK中存在一个错误(至少在内部版本1.8.0_162-b12上),更具体地说是在com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX类中com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX

The NPE is actually only a consequence of another bug, which is related to how the reader is handled in this class's bridge() method. NPE实际上只是另一个错误的结果,该错误与在此类的bridge()方法中如何处理阅读器有关。 There if the reader in not in the START_DOCUMENT state, the next event is only peeked but not advanced with nextEvent() on the very first time. 如果读者不在START_DOCUMENT状态,则仅在下一次事件被START_DOCUMENT ,而第一次没有使用nextEvent()进行下一个事件。 This leads to the first START_ELEMENT event to be processed twice. 这导致第一个START_ELEMENT事件被处理两次。 This can be well observed if you use a StreamResult instead of DOMResult . 如果使用StreamResult而不是DOMResult则可以很好地观察到这DOMResult There the NPE does not occur, but the XML produced in the result stream will contain the start of the tag of the first element twice. 在那里不会发生NPE,但是在结果流中生成的XML将两次包含第一个元素的标签的开头。

I'm trying now to workaround this with an XmlEventWriter that receives the DOMResult . 我现在正在尝试使用接收DOMResultXmlEventWriter来解决此DOMResult So, basically simulating what the Transformer would do, pushing each read event directly to that writer. 因此,基本上模拟了Transformer功能,将每个读取事件直接推给该编写器。 If I succeed, I'll post my solution here as well. 如果我成功了,我也将在这里发布我的解决方案。

PS: I would like to report this issue on the JDK or eventually even push a potential solution to it. PS:我想在JDK上报告此问题,或者最终甚至提出潜在的解决方案。 If anybody could tell me how this is supposed to be done, I would very much appreciate it. 如果有人能告诉我应该怎么做,我将非常感谢。

UPDATE: 更新:

So, I managed to workaround this issue with the approach mentioned above. 因此,我设法通过上述方法解决了这个问题。 Based on the code suggested in Reading a big XML file using stax and dom , instead of using the Transformer , you could use the following method: 根据使用stax和dom读取大型XML文件中建议的代码,而不是使用Transformer ,可以使用以下方法:

  private Node readToNode(final XMLEventReader reader) throws XMLStreamException, ParserConfigurationException {
    XMLEvent event = reader.peek();
    if (!event.isStartElement()) {
      throw new IllegalArgumentException("reader must be on START_ELEMENT event");
    }
    final Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
    final XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(new DOMResult(document));
    int depth = 0;
    do {
      event = reader.nextEvent();
      writer.add(event);
      if (event.isStartElement()) {
        depth++;
      } else if (event.isEndElement()) {
        depth--;
      }
    } while (reader.hasNext() && !(event.isEndElement() && depth <= 0));
    return document.getDocumentElement();
  }

However, this approach has some limitations ! 但是,这种方法有一些局限性 As visible in the code, we need to create a Document object that wraps the node, otherwise the XML writer will run into issues. 在代码中可见,我们需要创建一个包装该节点的Document对象,否则XML writer将遇到问题。 If you are intending to manipulate this DOM and send it afterwards to another active XMLEventWriter (as I was trying to do) using the Transformer again, it will fail. 如果您打算操纵此DOM,然后再使用Transformer将其发送到另一个活动的XMLEventWriter (如我所尝试的那样),它将失败。 This is because the Transformer will send a START_DOCUMENT event to the writer that had already started. 这是因为Transformer会将START_DOCUMENT事件发送给已经开始的编写器。 I tried the same approach the other way round, ie wrapping the node into a DOMSource , send it to another XmlEventReader and pipe the events to my existing XmlEventWriter , but that also doesn't work as XmlEventReader apparently supports only StreamSource s (see here ). 我想同样的方法倒过来,即包裹节点为DOMSource ,其发送给其他XmlEventReader和管道的事件,以我现有的XmlEventWriter ,但也不能按XmlEventReader显然只支持StreamSource S(参见这里 ) 。

Summarizing, if you only need the DOM objects, this could work well but if you're trying to transform XML fragments piping the events to a writer (as I do), you could run into issues. 总而言之,如果只需要DOM对象,这可能会很好地工作,但是如果您试图将传递事件的XML片段转换为writer(就像我所做的那样),则可能会遇到问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM