简体   繁体   English

使用JAXB和XMLStreamReader有效地解组大型xml文件的一部分

[英]Efficiently unmarshaling a part of a large xml file with JAXB and XMLStreamReader

I want to unmarshall part of a large XML file. 我想解组一个大型XML文件的一部分。 There exists solution of this already, but I want to improve it for my own implementation. 已经存在解决方案,但是我想针对自己的实现进行改进。

Please have a look at the following code: ( source ) 请看下面的代码:( source

public static void main(String[] args) throws Exception {
        XMLInputFactory xif = XMLInputFactory.newFactory();
        StreamSource xml = new StreamSource("input.xml");
        XMLStreamReader xsr = xif.createXMLStreamReader(xml);
        xsr.nextTag();

      while(!xsr.getLocalName().equals("VersionList")&&xsr.getElementText().equals("1.81")) {
            xsr.nextTag();
        }

I want to unmarshall the input.xml (given below) for the node: versionNumber="1.81" 我想为以下节点解编input.xml(如下所示):versionNumber =“ 1.81”

With the current code, the XMLStreamReader will first check the node versionNumber="1.80" and then it will check all sub nodes of versionNumber and then it will again move to node: versionNumber="1.81", where it will satisfy the exit condition of the while loop. 使用当前代码,XMLStreamReader将首先检查节点versionNumber =“ 1.80”,然后将检查versionNumber的所有子节点,然后将其再次移至节点:versionNumber =“ 1.81”,在该节点处满足以下条件: while循环。

Since, I want to check node versionNumber only, iterating its subnodes are unnecessary and for large xml file, iterating all sub nodes of version 1.80 will take lone time. 因为我只想检查node versionNumber,所以不需要迭代其子节点,并且对于大型xml文件,迭代1.80版的所有子节点将花费很长时间。 I want to check only root nodes (versionNumber) and if the first root node (versionNumber=1.80) is not matched, the XMLStreamReader should directly jump to next root node ((versionNumber=1.81)). 我只想检查根节点(versionNumber),如果第一个根节点(versionNumber = 1.80)不匹配,则XMLStreamReader应该直接跳到下一个根节点((versionNumber = 1.81))。 But it seems not achievable with xsr.nextTag(). 但是使用xsr.nextTag()似乎无法实现。 Is there any way, to iterate through the desired root nodes only? 有什么办法只能迭代所需的根节点吗?

input.xml: input.xml中:

   <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fileVersionListWrapper FileName="src.h">
    <VersionList versionNumber="1.80">
        <Reviewed>
            <commentId>v1.80(c5)</commentId>
            <author>Robin</author>
            <lines>47</lines>
            <lines>48</lines>
            <lines>49</lines>
        </Reviewed>
        <Reviewed>
            <commentId>v1.80(c6)</commentId>
            <author>Sujan</author>
            <lines>82</lines>
            <lines>83</lines>
            <lines>84</lines>
            <lines>85</lines>
        </Reviewed>
    </VersionList>
<VersionList versionNumber="1.81">
        <Reviewed>
            <commentId>v1.81(c4)</commentId>
            <author>Robin</author>
            <lines>47</lines>
            <lines>48</lines>
            <lines>49</lines>
        </Reviewed>
        <Reviewed>
            <commentId>v1.81(c5)</commentId>
            <author>Sujan</author>
            <lines>82</lines>
            <lines>83</lines>
            <lines>84</lines>
            <lines>85</lines>
        </Reviewed>
    </VersionList>
</fileVersionListWrapper>

You can get the node from the xml using XPATH 您可以使用XPATH从xml获取节点

XPath , the XML Path Language, is a query language for selecting nodes from an XML document. XPath (XML路径语言)是一种查询语言,用于从XML文档中选择节点。 In addition, XPath may be used to compute values (eg, strings, numbers, or Boolean values) from the content of an XML document. 另外,XPath可用于根据XML文档的内容计算值(例如,字符串,数字或布尔值)。 What is Xpath . 什么是Xpath

Your XPath expression will be 您的XPath表达式将是

/fileVersionListWrapper/VersionList[@versionNumber='1.81']

meaning you want to only return VersionList where the attribute is 1.81 意味着您只想返回Version1.8属性为1.81的版本

JAVA Code JAVA代码

I have made an assumption that you have the xml as string so you will need the following idea 我已经假设您将xml作为字符串,因此您需要以下思路

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();    
InputSource inputSource = new InputSource(new StringReader(xml));
Document document = builder.parse(inputSource);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/fileVersionListWrapper/VersionList[@versionNumber='1.81']");
NodeList nl = (NodeList) expr.evaluate(document, XPathConstants.NODESET);   

Now it will be simply loop through each node 现在将简单地遍历每个节点

for (int i = 0; i < nl.getLength(); i++)
{
  System.out.println(nl.item(i).getNodeName());
}

to get the nodes back to to xml you will have to create a new Document and append the nodes to it. 为了使节点回到xml,您将必须创建一个新Document并将节点附加到它。

  Document newXmlDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
  Element root = newXmlDocument.createElement("fileVersionListWrapper");
  for (int i = 0; i < nl.getLength(); i++)
  {
    Node node = nl.item(i);
    Node copyNode = newXmlDocument.importNode(node, true);
    root.appendChild(copyNode);
  }
  newXmlDocument.appendChild(root);

once you have the new document you will then run a serializer to get the xml. 获得新文档后,您将运行序列化程序以获取xml。

DOMImplementationLS domImplementationLS = (DOMImplementationLS) document.getImplementation();
LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
String string = lsSerializer.writeToString(document);

now that you have your String xml , I have made an assumption you already have a Jaxb object which looks similar to this 现在您有了String xml,我假设您已经有一个Jaxb对象,它看起来类似于

@XmlRootElement(name = "fileVersionListWrapper")
public class FileVersionListWrapper
{
  private ArrayList<VersionList> versionListArrayList = new ArrayList<VersionList>();

  public ArrayList<VersionList> getVersionListArrayList()
  {
    return versionListArrayList;
  }

  @XmlElement(name = "VersionList")
  public void setVersionListArrayList(ArrayList<VersionList> versionListArrayList)
  {
    this.versionListArrayList = versionListArrayList;
  }
}

Which you will simple use the Jaxb unmarshaller to create the objects for you 您将简单地使用Jaxb解组器为您创建对象

JAXBContext jaxbContext = JAXBContext.newInstance(FileVersionListWrapper .class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
StringReader reader = new StringReader(xmlString);
FileVersionListWrapper fileVersionListWrapper = (FileVersionListWrapper)  jaxbUnmarshaller.unmarshal(reader);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM