简体   繁体   English

代码抛出std :: bad_alloc,内存不足,或者可能是错误?

[英]code throws std::bad_alloc, not enough memory or can it be a bug?

I am parsing using a pretty large grammar (1.1 GB, it's data-oriented parsing). 我正在使用相当大的语法(1.1 GB,它是面向数据的解析)进行解析。 The parser I use (bitpar) is said to be optimized for highly ambiguous grammars. 据说我使用的解析器(bitpar)已针对高度歧义的语法进行了优化。 I'm getting this error: 我收到此错误:

1terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc
dotest.sh: line 11: 16686 Aborted                 bitpar -p -b 1 -s top -u unknownwordsm -w pos.dfsa /tmp/gsyntax.pcfg /tmp/gsyntax.lex arbobanko.test arbobanko.results

Is there hope? 有希望吗? Does it mean that it has ran out of memory? 这是否意味着内存不足? It uses about 15 GB before it crashes. 崩溃前它使用大约15 GB。 The machine I'm using has 32 GB of RAM, plus swap as well. 我正在使用的计算机具有32 GB的RAM,以及交换。 It crashes before outputting a single parse tree; 它在输出单个解析树之前崩溃; I think it crashes after reading the grammar, during an attempt to construct a chart parse for the first sentence. 在尝试为第一句话构造图表解析期间,我认为阅读语法后会崩溃。

The parser is an efficient CYK chart parser using bit vector representations; 解析器是使用位向量表示的高效CYK图解析器; I presume it is already pretty memory efficient. 我猜想它已经是相当高效的内存了。 If it really requires too much memory I could sample from the grammar rules, but this will decrease parse accuracy of course. 如果确实需要太多内存,我可以从语法规则中取样,但这当然会降低解析的准确性。

I think the problem is probably that I have a very large number of non-terminals, I should probably try to look for a different parser (any suggestions?) 我认为问题可能是我有很多非终端,我可能应该尝试寻找其他解析器(有什么建议吗?)

UPDATE: for posterity's sake, I found the problem a long time ago. 更新:为了后代,我很久以前就发现了这个问题。 The grammar was way too big due to a bug, so the parser couldn't handle it with the available memory. 由于存在错误,语法太大,因此解析器无法使用可用内存来处理它。 With the correct grammar (which is an order of magnitude smaller) it works fine. 使用正确的语法(小一个数量级),它可以正常工作。

It is possible that memory becomes fragmented. 内存可能会碎片化。 That means that your program can fail to allocate 1KB, even though 17 GB of memory is free, when those 17GB is fragmented into 34 million free chunks of 512 bytes each. 这意味着即使将17 GB的内存分成3400万个空闲块(每个512字节),即使有17 GB的可用内存,程序也可能无法分配1KB。

There's of course the possibility that your program miscalculates a memory allocation. 当然,程序可能会错误地计算内存分配。 A common bug is trying to allocate -1 bytes of memory. 一个常见的错误是试图分配-1个字节的内存。 As memory sizes are always positive, that's interpreted as size_t(-1) , much more than 32 GB. 由于内存大小始终为正,因此将其解释为size_t(-1)远远超过32 GB。 But there's really no fact which points in that direction. 但是,实际上没有朝着这个方向发展的事实。

To solve this problem, you will need someone who does speak C++. 要解决此问题,您将需要会讲C ++的人。 If it's indeed memory fragmentation, a good C++ programmer can tailor the memory allocation strategy to match your specific needs. 如果确实是内存碎片,那么优秀的C ++程序员可以调整内存分配策略来满足您的特定需求。 Some strategies include keeping same-sized objects together, and replacing string by shims. 一些策略包括将大小相同的对象放在一起,并用垫片代替字符串。

If your application uses 32Bit memory model then each process will get 4GB of virtual address space. 如果您的应用程序使用32Bit内存模型,则每个进程将获得4GB的虚拟地址空间。 Out of which only 2G is available for user space. 其中只有2G可用于用户空间。

I suspect your parser might be trying to allocate more than available virtual memory. 我怀疑您的解析器可能正在尝试分配比可用虚拟内存更多的内存。 I am not sure if the Parser provides mechanism for custom memory allocation. 我不确定解析器是否提供用于自定义内存分配的机制。 If so, you can try using memory mapped files for allocation and bring it to memroy only when it is needed. 如果是这样,您可以尝试使用内存映射文件进行分配,并仅在需要时才将其置于内存中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM