简体   繁体   English

expat解析器:memory消费

[英]expat parser: memory consumption

I am using expat parser to parse an XML file of around 15 GB.我正在使用 expat 解析器来解析大约 15 GB 的 XML 文件。 The problem is it throws an "Out of Memory" error and the program aborts.问题是它引发“内存不足”错误并且程序中止。

I want to know has any body faced a similar issue with the expat parser or is it a known bug and has been rectified in later versions?我想知道是否有任何机构面临与 expat 解析器类似的问题,或者它是一个已知的错误并已在以后的版本中得到纠正?

I've used expat to parse large files before and never had any problems.我以前用 expat 来解析大文件,从来没有遇到过任何问题。 I'm assuming you're using SAX and not one of the expat DOM wrappers.我假设您使用的是 SAX 而不是 expat DOM 包装器之一。 If you are using DOM, then that's your problem right there - it would be essentially trying to load the whole file into memory.如果您使用的是 DOM,那么这就是您的问题所在 - 它实际上是试图将整个文件加载到 memory 中。

Are you allocating objects as you parse the XML and maybe not deallocating them?您是否在解析 XML 时分配对象并且可能不释放它们? That would be the first thing I would check for.那将是我要检查的第一件事。 One way to check if the problem is really with expat or not - if you reduce the program to a simple version that has empty tag handlers (ie it just parses the file and does nothing with the results) does it still run out of memory?检查问题是否真的与 expat 相关的一种方法 - 如果您将程序简化为具有空标记处理程序的简单版本(即它只是解析文件并且对结果不做任何事情),它是否仍然用完 memory?

Expat has leaks - I've started using it in a long-running server, and am finding that it consistently leaks memory, whether the parser is freed or not. Expat 有泄漏 - 我已经开始在长期运行的服务器中使用它,并且发现它始终泄漏 memory,无论解析器是否被释放。 More recent versions of xmlparse.c do not resolve this problem, only hide existing leaks.最新版本的 xmlparse.c 不能解决这个问题,只能隐藏现有的泄漏。

I don't know expat at all, but I'd guess that it's having to hold too much state in memory for some reason.我根本不知道外籍人士,但我猜它必须在 memory 中持有太多的 state 出于某种原因。 Is the XML mal formed in some way? XML 是否以某种方式形成? Do you have handlers registered for end tags of large blocks?您是否为大块的结束标签注册了处理程序?

I'm thinking that if you have a handler registered for the end of a large block, and expat is expected to pass the block to the handler, then expat could be running out of memory before it's able to completely gather that block.我在想,如果您为一个大块的末尾注册了一个处理程序,并且预计 expat 将该块传递给该处理程序,那么 expat 在能够完全收集该块之前可能会用完 memory。 As I said, I don't know expat, so this might not be possible, I'm just asking.正如我所说,我不认识外籍人士,所以这可能是不可能的,我只是在问。

Alternately, are you sure that expat is where the memory loss is?或者,您确定外籍人士是 memory 损失的地方吗? I could imagine a situation where you were keeping some information about the contents of the XML file, and your own data structures, either because the data was so large, or because of memory leaks in your code, caused the out of memory condition.我可以想象这样一种情况,您保留有关 XML 文件的内容和您自己的数据结构的一些信息,或者因为数据太大,或者因为 memory 在您的代码中泄漏,导致了 ZCD69B4956F06CD8291Z8BF3 条件的输出。

Expat is an event-driven parser which does not construct large in-memory structures. Expat 是一个事件驱动的解析器,它不构造大型内存结构。 So it's probably not expat (which is very widely used for parsing large files) that is the problem - much more likely it is your own code.因此,问题可能不是 expat(它被广泛用于解析大文件)——更有可能是您自己的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM