简体   繁体   English

以流方式从Java解析XML的更通用方法?

[英]More generic way to parse XML from Java in a streaming fashion?

I need to efficiently parse potentially very large XML files (and hence cannot put the whole file in memory). 我需要有效地解析可能非常大的XML文件(因此无法将整个文件放入内存中)。 As such I've looked into streaming techniques like XMLStreamReader, however these appear to be very low-level and produce very hard-coded code: 因此,我研究了XMLStreamReader之类的流技术,但是这些技术看起来很底层,并且会产生非常硬的代码:

   event = parser.next();
   switch (event)
   {
    case XMLStreamConstants.START_ELEMENT:
         elementName = parser.getLocalName();
         if (elementName.equals("name")){
             state = FOUND_A_NAME;
         }else if (elementName.equals("address")){
             state = FOUND_AN_ADDRESS;                      
         }
    ETC...
    }

I am looking for a way to do this without so tightly coupling the parser with the thing to parse, and in addition, this code just does not feel right. 我正在寻找一种方法,而不将解析器与要解析的内容紧密耦合,此外,此代码感觉不对。 It seems like this should be more truly event-oriented. 看来这应该更真正地面向事件。

Any advice? 有什么建议吗?

SAX has events that do exactly what you think they should.. :) http://www.saxproject.org/quickstart.html shows a simple codebase that does that. SAX的事件可以完全按照您的想法执行。.:) http://www.saxproject.org/quickstart.html演示了一个简单的代码库。 Am I missing something? 我想念什么吗?

If you're looking for a higher-level language for processing XML in streaming mode, and if you don't mind being at the bleeding edge, consider the streaming facilities in Saxon-EE 9.3 XSLT - a partial implementation of the draft XSLT 3.0 specification. 如果您正在寻找用于以流模式处理XML的高级语言,并且不介意处于最新状态,请考虑Saxon-EE 9.3 XSLT中的流功能-XSLT 3.0草案的部分实现规格。

http://www.saxonica.com/documentation/sourcedocs/streaming.xml http://www.saxonica.com/documentation/sourcedocs/streaming.xml

This can be written generic. 这可以写成通用的。 For example I have a properties file that has mapping between xml element name and class field name/ hashmap key name. 例如,我有一个属性文件,该文件在xml元素名称和类字段名称/哈希映射键名称之间进行了映射。

if (event.isStartElement()) {
 if  (event.asStartElement().getName().getLocalPart().equals(XMLElementName)) {

    event = eventReader.nextEvent();
    fields.put(classFieldName, event.asCharacters().getData());
        continue;
 }
}

this helps us to have one parser to parse different xml messages. 这有助于我们拥有一个解析器来解析不同的xml消息。 This is just an idea.. we can do more .. 这只是一个想法..我们可以做更多..

I don't think the tightly-coupled nature of your code is anything to do with StAX, that's just the way you've chosen to write it. 我认为代码的紧密耦合本质与StAX无关,这只是您选择编写代码的方式。

You could easily refactor that code to delegate handling of the events to handler objects, using a lookup table of, for example, element names to handler objects. 您可以使用查找表(例如,将元素名称添加到处理程序对象)轻松地重构该代码,以将事件的处理委派给处理程序对象。 This mechanism coulpe be entirely generic and reusable. 这种机制是完全通用和可重用的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM