简体   繁体   English

VTD XML(Java)VTDNavHuge将XPath结果写入文件

[英]VTD XML (Java) VTDNavHuge write XPath result to file

I am experimenting with VTD XML because I frequently need to modify huge XML files (2-10GB or more). 我正在尝试使用VTD XML,因为我经常需要修改巨大的XML文件(2-10GB或更多)。

I am try to write an XPath Query result back to a file. 我尝试将XPath查询结果写回到文件中。 Writing huge files in VTD XML is not obvious to me though: 但是,用VTD XML编写大型文件对我来说并不明显:

  1. The method getBytes() is "not implemented" for XMLMemMappedBuffer (see https://jar-download.com/javaDoc/com.ximpleware/vtd-xml/2.13/com/ximpleware/extended/XMLMemMappedBuffer.html ) XMLMemMappedBuffer的方法getBytes()未实现(请参阅https://jar-download.com/javaDoc/com.ximpleware/vtd-xml/2.13/com/ximpleware/extended/XMLMemMappedBuffer.html

  2. One of the authors (?) gives a code example in this thread (last post, 2010-04-21): https://sourceforge.net/p/vtd-xml/discussion/379067/thread/a2e03ede/ 作者之一(?)在此线程中提供了代码示例(最新文章,2010-04-21): https : //sourceforge.net/p/vtd-xml/discussion/379067/thread/a2e03ede/

However, the example is outdated as 但是,该示例已过时

long la = vnh.getElementFragment();

returns an Array long[] (see https://jar-download.com/java-documentation-javadoc.php?a=vtd-xml&g=com.ximpleware&v=2.13 ) 返回一个数组long [](请参阅https://jar-download.com/java-documentation-javadoc.php?a=vtd-xml&g=com.ximpleware&v=2.13

Adapting the relevant lines like this 这样修改相关的行

long[] la = vnh.getElementFragment();
vnh.getXML().writeToFileOutputStream(new FileOutputStream("c:/text2.xml"), (int)la[0], (int)la[1]);

results in the following error: 导致以下错误:

Exception in thread "main" java.nio.channels.ClosedChannelException
    at sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source)
    at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
    at com.ximpleware.extended.XMLMemMappedBuffer.writeToFileOutputStream(XMLMemMappedBuffer.java:104)
    at WriteXML.main(WriteXML.java:16)

Questions: 问题:

  • Is this error due to any obvious mistake in the code? 该错误是由于代码中的任何明显错误导致的吗?
  • What tools would you use to handle huge XML files (~10GB) efficiently? 您将使用哪些工具来有效处理巨大的XML文件(约10GB)? (Does not have to be Java.) (不必是Java。)

My goal is to do simple transformations or split the xml and write back to file with great performance. 我的目标是进行简单的转换或拆分xml,然后以出色的性能写回文件。 Thanks! 谢谢!

无法回答您的第一个问题,但是至于第二个问题,如果您正在寻找不同的技术,那么流XSLT 3.0是一个值得探索的问题:如果不查看您的要求的更多细节,就无法确定它是否真正合适。

First of all, to process XML of huge size as you mentioned, I suggest that you load xml into memory using mem-map mode. 首先,要处理您提到的大尺寸XML,建议您使用mem-map模式将xml加载到内存中。 And since vtd-xml doesn't alter the underlying byte format of xml, you can easily imagine saving a lot of back-and-forth encoding/decoding byte-moving operations and the performance advantage thereof. 而且由于vtd-xml不会更改xml的基本字节格式,因此您可以轻松地想象节省了很多来回编码/解码字节移动操作及其性能优势。

As you have pointed out, XMLMemMappedBuffer getBytes is not implemented... this is to avoid excessive memory usage when the fragment is very large... 正如您所指出的,未实现XMLMemMappedBuffer getBytes ...这是为了避免在片段很大时使用过多的内存...

your work around is to use XMLMemMappedBuffer's writeToFileOutputStream() method to directly dump it to output. 您的解决方法是使用XMLMemMappedBuffer的writeToFileOutputStream()方法直接将其转储到输出。 In other words, if you know the offset and length of the fragment... getBytes is often bypass-able. 换句话说,如果您知道片段的偏移量和长度... getBytes通常是可以绕过的。

Below is the signature document of that method. 以下是该方法的签名文件。

public void writeToFileOutputStream(java.io.FileOutputStream ost, long os, long len) throws java.io.IOException write the segment (denoted by its offset and length) into an output file stream public void writeToFileOutputStream(java.io.FileOutputStream ost,long os,long len)引发java.io.IOException将段(由其偏移量和长度表示)写入输出文件流

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM