简体   繁体   English

使用拉式解析器一次在Java中读取一个节点的大型xml文件?

[英]Reading large xml files one node at a time in Java with a pull parser?

I'd like to parse large XML files and read in a complete node at a time from Java. 我想解析大型XML文件,并一次从Java读取一个完整的节点。 The files are to large to put in a tree. 文件很大,可以放入树中。 I'd like to use a pull parser if possible since it appears to be easier to program for. 如果可能的话,我想使用提取解析器,因为它似乎更容易编程。 Given the following XML data 给定以下XML数据
Instead of having to check every event while using the StAX parser I'd like each call to hasNext or some similar function to return an object containing the complete info on a record node. 无需在使用StAX解析器时检查每个事件,而是希望对hasNext或某个类似函数的每次调用返回一个包含记录节点上完整信息的对象。 When using Perl XML::LibXML::Reader allows me to do this using it's read method so I'm looking for an equivalent in Java. 使用Perl XML :: LibXML :: Reader时,我可以使用其read方法来执行此操作,因此我正在寻找Java中的等效方法。

Commons Digester is really good for this type of problem. Commons Digester对于这种类型的问题确实非常有用。 It allows you to configure parsing rules whereby when the parser encounters certain tags it performs some action (eg calls a factory method to create an object). 它允许您配置解析规则,从而在解析器遇到某些标签时执行某些操作(例如,调用工厂方法来创建对象)。 You don't have to write any parsing code, making development fast and lightweight. 您无需编写任何解析代码,从而使开发既快速又轻便。

Below is a simple example pattern you could use: 以下是您可以使用的简单示例模式:

<pattern value="myConfigFile/foos/foo">
    <factory-create-rule classname="FooFactory"/>
    <set-next-rule methodname="processFoo" paramtype="com.foo.Foo"/>
</pattern>

When the parser encounters the "foo" tag it will call createObject(Attributes) on FooFactory , which will create a Foo object. 当解析器遇到“ foo”标签时,它将在FooFactory上调用createObject(Attributes) ,这将创建一个Foo对象。 The parser will then call processFoo on the object at the top of the Digester stack (you would typically push this onto the stack before commencing parsing). 然后,解析器将在Digester堆栈顶部的对象上调用processFoo (通常在开始解析之前,将其推入堆栈)。 You could therefore implement processFoo to either add these objects to a collection, or if your file is too big simply process each object as it arrives and then throw it away. 因此,您可以实现processFoo来将这些对象添加到集合中,或者如果文件太大,则只需在每个对象到达时对其进行处理,然后将其丢弃。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM