简体   繁体   English

在Java中读取XML元素列表

[英]Reading a list of XML elements in Java

I would like to iterate over an XML document that is essentially a list of identically structured XML elements. 我想迭代一个XML文档,它本质上是一个相同结构的XML元素列表。 The elements will be serialized into Java objects. 元素将被序列化为Java对象。

<root>
    <element attribute="value" />
    <element attribute="value" />
    <element attribute="value" />
    ...
</root>

There are a lot of elements within the root element. 根元素中有很多元素。 I would prefer not to load them all into memory. 我宁愿不将它们全部加载到内存中。 I realize I could use a SAX handler for this, but using a SAX handler to deserialize everything into Java objects seems rather obtuse. 我意识到我可以使用SAX处理程序,但使用SAX处理程序将所有内容反序列化为Java对象似乎相当迟钝。 I find JDOM very easy to use, but as far as I can tell JDOM always parses the entire tree. 我发现JDOM非常容易使用,但据我所知,JDOM总是会解析整个树。 Is there a way I can use JDOM to parse the subelements one at a time? 有没有办法可以使用JDOM一次解析一个子元素?

Another reason for using JDOM is it makes writing serialization/deserialization code easy for the corresponding Java objects, which are meaningless if not entirely in memory. 使用JDOM的另一个原因是它使得为相应的Java对象编写序列化/反序列化代码变得容易,如果不是完全在内存中则这些代码是没有意义的。 However, I don't want to load all of the Java objects into memory at the same time. 但是,我不想同时将所有Java对象加载到内存中。 Rather, I want to iterate over them once. 相反,我想迭代它们一次。

update: here is an example of how to do this in dom4j: http://docs.codehaus.org/display/GROOVY/Reading+XML+with+Groovy+and+DOM4J . 更新:这是如何在dom4j中执行此操作的示例: http//docs.codehaus.org/display/GROOVY/Reading+XML+with+Groovy+and+DOM4J Anyway to do this in jdom? 无论如何要在jdom中这样做?

Why not use StAX (javax.xml.stream.*, an implementation is included in Java SE 6) to stream in the XML, and convert individual portions to objects? 为什么不使用StAX(javax.xml.stream。*,Java SE 6中包含一个实现)来流式传输XML,并将各个部分转换为对象?

import java.io.FileReader;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Element.class);
        Unmarshaller unmarshaller = jc.createUnmarshaller();

        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag();
        xsr.nextTag();
        while(xsr.hasNext()) {
            Element element = (Element) unmarshaller.unmarshal(xsr);
            System.out.println(element.getAttribute());
            if(xsr.nextTag() != XMLStreamReader.START_ELEMENT) {
                break;
            }
        }
    }

}

In the above example each individual "element" is unmarshalled into a POJO using JAXB (an implementation is included in Java SE 6), but you could process the fragment as you saw fit. 在上面的示例中,使用JAXB将每个单独的“元素”解组为POJO(Java SE 6中包含一个实现),但您可以按照您认为合适的方式处理该片段。 JAXB model details below: JAXB型号详情如下:

import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement
public class Element {

    private String attribute;

    @XmlAttribute
    public String getAttribute() {
        return attribute;
    }

    public void setAttribute(String attribute) {
        this.attribute = attribute;
    }

}

Note: 注意:

StAX and JAXB are also compatible with Java SE 5, you just need to download the implementations separately. StAX和JAXB也与Java SE 5兼容,您只需单独下载实现。

You should use VTD-XML , it is mainly used for stream processing. 您应该使用VTD-XML ,它主要用于流处理。 I use it to read product feeds from advertisers. 我用它来阅读广告商提供的产品。

The great advantage is that it only needs an XPath and it can iterate over the XML in blazing speed and has a very small memory footprint (only keeps a few pointers while iterating over the XML). 它的最大优点是它只需要一个XPath,它可以以极快的速度迭代XML并且内存占用非常小(在迭代XML时只保留几个指针)。

I know the site says they perform x5-12 times faster than parsing the DOM, but from my experience for your kind of task (especially if the size is in the 100s of MB) you can easily get x20 speed. 我知道该网站表示它们比解析DOM的速度快了x5-12倍,但根据我对你的任务的经验(特别是如果大小在100的MB中),你可以轻松获得x20的速度。

Here is a simple example of how to read your XML using VTD-XML: 以下是如何使用VTD-XML读取XML的简单示例:

VTDGen vg = new VTDGen();
AutoPilot ap = new AutoPilot();
int i;
ap.selectXPath("/root/element");
if (vg.parseFile(FILE_LOCATION,true)){
    VTDNav vn = vg.getNav();
    ap.bind(vn); // apply XPath to the VTDNav instance
    // AutoPilot moves the cursor for you
    while((i=ap.evalXPath())!=-1){
        System.out.println("the attribute index val is "+ 
            i +" the attribute string ==>"+vn.toString(vn.getAttrVal("attribute")));
    }
}

Short answer: No. Jdom is about parsing xml and turning it into a data structure to perform operations on. 简短的回答:没有.Jdom是关于解析xml并将其转换为数据结构来执行操作。 This means always deserializing the entire xml. 这意味着始终反序列化整个xml。

One easy approach that would cut down on your memory requirements would be to use XPath with JDOM to query a subset of your XML and get only those bits that satisfy your query. 减少内存需求的一种简单方法是使用XPath和JDOM来查询XML的子集,并仅获取满足查询的那些位。

Otherwise you could check out this interesting hint from Elliotte Rusty Harold , it indicates that the streaming API you want is there, just not advertised: 否则你可以看看Elliotte Rusty Harold的这个有趣提示 ,它表明你想要的流媒体API就在那里,只是没有做广告:

JDOM does have a streaming API. JDOM确实有一个流API。 It's just sort of hidden and not widely advertised or explained. 它只是隐藏的,没有广泛宣传或解释。 In XOM, I made this approach a lot more explicit and documented it. 在XOM中,我使这种方法更加明确并记录下来。 If a streaming tree model is what you want, you're probably better off with XOM, but if you have to stick with JDOM then reading the XOM examples will probably give you enough clue about how to use JDOM in streaming mode. 如果您想要的是流树模型,那么使用XOM可能会更好,但如果您必须坚持使用JDOM,那么阅读XOM示例可能会为您提供有关如何在流模式下使用JDOM的足够线索。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM