简体   繁体   English

Stax:如何开始从XML文件的特定位置进行解析?

[英]Stax: How start to parse from a certain position of XML file?

I have a very big XML file (500Mb). 我有一个很大的XML文件(500Mb)。 Is it possible to keep track of the position of the last parsed element in this case? 在这种情况下是否可以跟踪最后一个解析的元素的位置? So, say, if I have successfully parsed half of it or jvm has crashed abruptly, I can start immediately from the position where I left the last time. 因此,如果我成功解析了其中一半,或者jvm突然崩溃,那么我可以从上次离开的位置立即开始。

You could presumably write some form of history store to contain structure up till the point you've parsed; 您大概可以编写某种形式的历史存储来包含直到解析为止的结构。 however I suspect that to continue parsing from that point you would have to turn off all forms of validation on your parser - XML is intended to guarantee the structure and contents of a document from head to foot; 但是我怀疑从那点开始继续解析,您将不得不关闭解析器上的所有形式的验证-XML旨在从头到尾保证文档的结构和内容; it's not really designed for ad-hoc parsing. 它并不是为临时解析而设计的。

In your case you would still need to be able to provide some form of context - perhaps by keeping the current working element tree in memory, concatenating this with the relevant header information and parsing as if you're starting over with a new file; 在您的情况下,您仍然需要能够提供某种形式的上下文-也许是通过将当前的工作元素树保存在内存中,将其与相关的头信息连接起来,然后像解析新文件一样进行解析; only submitting the outstanding content instead of the whole file. 仅提交未完成的内容,而不是整个文件。

eg, given the XML structure: 例如,给定XML结构:

<root>
  <child id="1">
    <subchild id="1'/>
  </child>
  <child id="2'>
    <subchild id="2"/>
    <subchild id="3"/>
  <child/>

If your parser crashes after parsing <child id="1"/> , you need to craft a new pseudo-documnent containing a <root> element, and also keep note of the fact that you have already parsed child 1 when you resume processing - in case of any dependency issues. 如果解析器在解析<child id="1"/>后崩溃,则需要制作一个包含<root>元素的新伪文档,并在继续处理时注意已经解析了子代1的事实。 -如果有任何依赖性问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM