[英]How to merge (join) two different xml files by node name-value in VTD-XML?
I am a newbie in Java, after evaluate some java libraries i choosed VTD-XML by its performance tests and the option to use Xpath, I tried StaX and i think is not for human beings, really hard to understand how the parsing works (almost for me XD). 我是Java的新手,在评估了一些Java库之后,我通过其性能测试和使用Xpath的选项选择了VTD-XML,我尝试了StaX,但我认为这不适合人类,真的很难理解解析的工作原理(几乎对我来说XD)。
So, my target is to "inject" the geo_code node from partial_geo_codes.xml into geo_code accommodations.xml matching the values on node ext_id from both 因此,我的目标是将geo_code节点从partial_geo_codes.xml“注入”到geo_code accommodations.xml中,使其与两个节点上ext_id上的值匹配
accommodation.xml accommodation.xml
<accommodations>
<accommodation>
<ext_id>12345</ext_id>
<type>A</type>
<details>D</details>
<geo_code />
</accommodation>
and this is the file to be appended into accommodation.xml: 这是要附加到Accommodation.xml中的文件:
partial_geo_codes.xml partial_geo_codes.xml
<geo_codes>
<geo_code>
<ext_id>12345</ext_id>
<geo_idlocacion>77500</geo_idlocacion>
<latitude>42.578114</latitude>
<longitude>1.648293</longitude>
</geo_code>
<geo_code>
...
<geo_code>
<geo_code>
...
<geo_code>
<geo_codes>
this is the expected output: 这是预期的输出:
accommodation_new.xml accommodation_new.xml
<accommodations>
<accommodation>
<ext_id>12345</ext_id>
<type>A</type>
<details>D</details>
<geo_code>
<ext_id>12345</ext_id>
<geo_idlocacion>77500</geo_idlocacion>
<latitude>42.578114</latitude>
<longitude>1.648293</longitude>
<geo_code>
</accommodation>
<accommodation>
.....
</accommodation>
......
</accommodations>
and this is my "wannabe-really-sucks" java class: 这是我的“ wannabe-really-sucks” java类:
import com.ximpleware.extended.*;
import java.io.*;
public class MergeVtd {
public static void main(String args[]) throws Exception {
String filesPath = new java.io.File("").getAbsolutePath() .concat("/main/src/");
long start = System.currentTimeMillis();
//init original xml
VTDGenHuge vgh = new VTDGenHuge();
//init tobemerged xml
VTDGenHuge vgm = new VTDGenHuge();
if (vgm.parseFile(filesPath.concat("partial_geo_code.xml"),true,VTDGenHuge.MEM_MAPPED)){
VTDNavHuge vnm = vgm.getNav();
AutoPilotHuge apm = new AutoPilotHuge(vnm);
apm.selectElement("ext_id");
int count=0;
while (apm.iterate()){
int t = vnm.getText();
if (t!=-1) {
System.out.println("Value vnm ==> "+vnm.toNormalizedString(t));
//we have id to match....
if (vgh.parseFile(filesPath.concat("accommodation.xml"),true,VTDGenHuge.MEM_MAPPED)){
VTDNavHuge vnh = vgh.getNav();
AutoPilotHuge aph = new AutoPilotHuge(vnh);
aph.selectXPath("/accommodations/accommodation/ext_id[text()='" + vnm.toNormalizedString(t) + "']" );
int result = -1;
while ((result=aph.evalXPath())!=-1){
int g = vnh.getText();
if (g!=-1) {
System.out.println("Value vnh ==> "+vnh.toNormalizedString(g));
} else {
System.out.println("no match in vnh !======= ");
}
}
}
}
System.out.println("============================== " + count);
count++;
}
}
long end = System.currentTimeMillis();
System.out.println("Execution time was "+ (end - start) +" ms.");
System.exit(0);
}
}
i really appreciate any clue helping me how to iterate into 2 xml files at once and merge by ext_id node value much faster, now really takes too much time. 我非常感谢有任何线索可以帮助我立即迭代到2个xml文件并通过ext_id节点值更快地合并,现在确实需要太多时间。
How big is partial_geo_codes.xml? partial_geo_codes.xml有多大? Can it fit in memory?
它可以容纳在内存中吗? If yes then I would recommend indexing it using hash-map.
如果是,那么我建议使用哈希映射为它建立索引。 Just create simple HashMap, and put there references to geo_code nodes with values of ext_id as keys.
只需创建简单的HashMap,然后将对ext_id值作为键的geo_code节点的引用放在此处即可。
Having done that you'll need to pass accomodations.xml only once. 完成此操作后,您只需传递一次accomodations.xml。 Right now your algorithm complexity is O(n^2), what's worse is that involves n reads from disk!
现在,您的算法复杂度为O(n ^ 2),更糟糕的是涉及从磁盘读取n次! Version with HashMap will take O(n) time and will require only single pass through both xml files.
使用HashMap的版本将花费O(n)时间,并且只需要一次通过两个xml文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.