简体   繁体   English

如何在VTD-XML中按节点名称-值合并(合并)两个不同的xml文件?

[英]How to merge (join) two different xml files by node name-value in VTD-XML?

I am a newbie in Java, after evaluate some java libraries i choosed VTD-XML by its performance tests and the option to use Xpath, I tried StaX and i think is not for human beings, really hard to understand how the parsing works (almost for me XD). 我是Java的新手,在评估了一些Java库之后,我通过其性能测试和使用Xpath的选项选择了VTD-XML,我尝试了StaX,但我认为这不适合人类,真的很难理解解析的工作原理(几乎对我来说XD)。

So, my target is to "inject" the geo_code node from partial_geo_codes.xml into geo_code accommodations.xml matching the values on node ext_id from both 因此,我的目标是将geo_code节点从partial_geo_codes.xml“注入”到geo_code accommodations.xml中,使其与两个节点上ext_id上的值匹配

accommodation.xml accommodation.xml

<accommodations>
 <accommodation>
  <ext_id>12345</ext_id>
  <type>A</type>
  <details>D</details>
  <geo_code />
  </accommodation>

and this is the file to be appended into accommodation.xml: 这是要附加到Accommodation.xml中的文件:

partial_geo_codes.xml partial_geo_codes.xml

<geo_codes>
 <geo_code>
  <ext_id>12345</ext_id>
  <geo_idlocacion>77500</geo_idlocacion>
  <latitude>42.578114</latitude>
  <longitude>1.648293</longitude>
  </geo_code>
  <geo_code>
      ...
  <geo_code>
  <geo_code>
      ...
  <geo_code>
 <geo_codes>

this is the expected output: 这是预期的输出:

accommodation_new.xml accommodation_new.xml

<accommodations>
 <accommodation>
  <ext_id>12345</ext_id>
  <type>A</type>
  <details>D</details>
  <geo_code>
    <ext_id>12345</ext_id>
    <geo_idlocacion>77500</geo_idlocacion>
    <latitude>42.578114</latitude>
    <longitude>1.648293</longitude>
  <geo_code> 
  </accommodation>
  <accommodation>
   .....
  </accommodation>
  ...... 
</accommodations>

and this is my "wannabe-really-sucks" java class: 这是我的“ wannabe-really-sucks” java类:

import com.ximpleware.extended.*;
import java.io.*;

public class MergeVtd  {

 public static void main(String args[]) throws Exception {

    String filesPath = new java.io.File("").getAbsolutePath() .concat("/main/src/");
    long start = System.currentTimeMillis();


    //init original xml
    VTDGenHuge vgh = new VTDGenHuge();
    //init tobemerged xml
    VTDGenHuge vgm = new VTDGenHuge();


    if (vgm.parseFile(filesPath.concat("partial_geo_code.xml"),true,VTDGenHuge.MEM_MAPPED)){

        VTDNavHuge vnm = vgm.getNav();
        AutoPilotHuge apm = new AutoPilotHuge(vnm);
        apm.selectElement("ext_id");


        int  count=0;
        while (apm.iterate()){
            int t = vnm.getText();
            if (t!=-1)    {
                System.out.println("Value vnm ==> "+vnm.toNormalizedString(t));

            //we have id to match....

            if (vgh.parseFile(filesPath.concat("accommodation.xml"),true,VTDGenHuge.MEM_MAPPED)){
                VTDNavHuge vnh = vgh.getNav();
                AutoPilotHuge aph = new AutoPilotHuge(vnh);
                aph.selectXPath("/accommodations/accommodation/ext_id[text()='" + vnm.toNormalizedString(t) + "']" );


                int result = -1;
                while ((result=aph.evalXPath())!=-1){
                    int g = vnh.getText();
                    if (g!=-1)  {
                        System.out.println("Value vnh ==> "+vnh.toNormalizedString(g));

                    }  else {
                        System.out.println("no match in vnh !======= ");
                    }
                }
            }

            }

            System.out.println("============================== " + count);
            count++;

        }

    }

    long end = System.currentTimeMillis();
    System.out.println("Execution time was "+ (end - start) +" ms.");
    System.exit(0);

 }

}

i really appreciate any clue helping me how to iterate into 2 xml files at once and merge by ext_id node value much faster, now really takes too much time. 我非常感谢有任何线索可以帮助我立即迭代到2个xml文件并通过ext_id节点值更快地合并,现在确实需要太多时间。

How big is partial_geo_codes.xml? partial_geo_codes.xml有多大? Can it fit in memory? 它可以容纳在内存中吗? If yes then I would recommend indexing it using hash-map. 如果是,那么我建议使用哈希映射为它建立索引。 Just create simple HashMap, and put there references to geo_code nodes with values of ext_id as keys. 只需创建简单的HashMap,然后将对ext_id值作为键的geo_code节点的引用放在此处即可。

Having done that you'll need to pass accomodations.xml only once. 完成此操作后,您只需传递一次accomodations.xml。 Right now your algorithm complexity is O(n^2), what's worse is that involves n reads from disk! 现在,您的算法复杂度为O(n ^ 2),更糟糕的是涉及从磁盘读取n次! Version with HashMap will take O(n) time and will require only single pass through both xml files. 使用HashMap的版本将花费O(n)时间,并且只需要一次通过两个xml文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM