如果独立运行，gdb下Linux上的C代码运行方式是否不同？

Question

I have built a plain C code on Linux (Fedora) using code-sorcery tool-chain. 我已经使用代码存储工具链在Linux（Fedora）上构建了普通的C代码。 This is for ARM Cortex-A8 target. 这是针对ARM Cortex-A8目标的。 This code is running on a Cortex A8 board, running embedded Linux. 该代码在运行嵌入式Linux的Cortex A8板上运行。

When I run this code for some test case, which does dynamic memory allocation ( malloc ) for some large size (10MB), it crashes after some time giving error message as below: 当我针对某些测试用例运行此代码时，该测试用例会为较大的大小（10MB）执行动态内存分配（ malloc ），经过一段时间后它崩溃，并给出错误消息，如下所示：

select 1 (init), adj 0, size 61, to kill
select 1030 (syslogd), adj 0, size 64, to kill
select 1032 (klogd), adj 0, size 74, to kill
select 1227 (bash), adj 0, size 378, to kill
select 1254 (ppp), adj 0, size 1069, to kill
select 1255 (TheoraDec_Corte), adj 0, size 1159, to kill
send sigkill to 1255 (TheoraDec_Corte), adj 0, size 1159
Program terminated with signal SIGKILL, Killed.

Then, when I debug this code for the same test case using gdb built for the target, the point where this dynamic memory allocation happens, code fails to allocate that memory and malloc returns NULL . 然后，当我使用为目标构建的gdb在相同的测试用例上调试此代码时，即发生这种动态内存分配的时刻，代码无法分配该内存，并且malloc返回NULL 。 But during normal stand-alone run, I believe malloc should be failing to allocate but it strangely might not be returning NULL , but it crashes and the OS kills my process. 但是在正常的独立运行期间，我相信malloc应该分配失败，但是奇怪的是它可能不会返回NULL ，但是它崩溃了，操作系统杀死了我的进程。

Why is this behaviour different when run under gdb and when without debugger? 为什么在gdb下运行和没有调试器时这种行为不同？
Why would malloc fails yet not return a NULL . 为什么malloc失败却没有返回NULL 。 Could this be possible, or the reason for the error message I am getting is else? 这可能吗，还是我收到错误消息的原因是其他？
How do I fix this? 我该如何解决？

thanks, 谢谢，

-AD -广告

Answer 1

So, for this part of the question, there is a surefire answer: 因此，对于问题的这一部分，有一个肯定的答案：

Why would malloc fails yet not return a NULL. 为什么malloc失败却没有返回NULL。 Could this be possible, or the reason for the error message i am getting is else? 这可能吗，还是我收到错误消息的原因是其他？

In Linux, by default the kernel interfaces for allocating memory almost never fail outright. 在Linux中，默认情况下，用于分配内存的内核接口几乎永远不会完全失败。 Instead, they set up your page table in such a way that on the first access to the memory you asked for, the CPU will generate a page fault , at which point the kernel handles this and looks for physical memory that will be used for that (virtual) page. 取而代之的是，它们以如下方式设置您的页表：在首次访问您要访问的内存时，CPU将生成一个页面错误，此时内核将处理该错误并寻找将用于该操作的物理内存。（虚拟）页面。 So, in an out-of-memory situation, you can ask the kernel for memory, it will "succeed", and the first time you try to touch that memory it returned back, this is when the allocation actually fails, killing your process. 因此，在内存不足的情况下，您可以向内核请求内存，它将“成功”，并且当您第一次尝试触摸该内存时，它又返回了，这是当分配实际上失败时，终止了进程。 (Or perhaps some other unfortunate victim. There are some heuristics for that, which I'm not incredibly familiar with. See " oom-killer ".) （或者也许是其他不幸的受害者。对此有一些启发式方法，我对此并不十分熟悉。请参阅“ oom-killer ”。）

Some of your other questions, the answers are less clear for me. 您的其他一些问题，对我来说答案还不太清楚。

Why is this behaviour different when run under gdb and when without debugger? 为什么在gdb下运行和没有调试器时这种行为不同？

It could be (just a guess really) that GDB has its own malloc , and is tracking your allocations somehow. GDB有自己的malloc并可能以某种方式跟踪您的分配（可能只是一个猜测）。 On a somewhat related point, I've actually frequently found that heap bugs in my code often aren't reproducible under debuggers. 在某种程度上相关的一点上，我实际上经常发现我的代码中的堆错误通常无法在调试器下重现。 This is frustrating and makes me scratch my head, but it's basically something I've pretty much figured one has to live with... 这令人沮丧，使我挠头，但这基本上是我已经意识到必须忍受的东西...

How do i fix this? 我该如何解决？

This is a bit of a sledgehammer solution (that is, it changes the behavior for all processes rather than just your own, and it's generally not a good idea to have your program alter global state like that), but you can write the string 2 to /proc/sys/vm/overcommit_memory . 这有点像大锤解决方案（也就是说，它会更改所有进程的行为，而不仅仅是您自己的行为，通常不希望这样改变程序的全局状态），但是您可以编写字符串2到/proc/sys/vm/overcommit_memory 。 See this link that I got from a Google search. 请参阅我从Google搜索获得的此链接。

Failing that... I'd just make sure you're not allocating more than you expect to. 失败了...我只是确保您分配的资源不超出您的预期。

Answer 2

By definition running under a debugger is different than running standalone. 根据定义，在调试器下运行与独立运行不同。 Debuggers can and do hide many of the bugs. 调试器可以并且确实隐藏了许多错误。 If you compile for debugging you can add a fair amount of code, similar to compiling completely unoptimized (allowing you to single step or watch variables for example). 如果要进行调试编译，则可以添加大量代码，类似于完全未优化的编译（例如，允许您单步执行或观察变量）。 Where compiling for release can remove debugging options and remove code that you needed, there are many optimization traps you can fall into. 在进行发布的编译可以删除调试选项并删除所需代码的地方，您可以使用许多优化陷阱。 I dont know from your post who is controlling the compile options or what they are. 从您的帖子中我不知道谁在控制编译选项或它们是什么。

Unless you plan to deliver the product to be run under the debugger you should do your testing standalone. 除非您打算交付要在调试器下运行的产品，否则应该独立进行测试。 Ideally do your development without the debugger as well, saves you from having to do everything twice. 理想情况下，您也无需调试器即可进行开发，从而使您不必重复两次。

It sounds like a bug in your code, slowly re-read your code using new eyes as if you were explaining it to someone, or perhaps actually explain it to someone, line by line. 这听起来像是您的代码中的错误，就像用新的眼睛慢慢地重新阅读代码一样，就像是在向某人解释，或者实际上是在逐行向某人解释。 There may be something right there that you cannot see because you have been looking at it the same way for too long. 可能有些东西您看不到，因为您已经用相同的方式看了太久了。 It is amazing how many times and how well that works. 令人惊讶的是，它运行了多少次，效果如何。

I could also be a compiler bug. 我也可能是编译器错误。 Doing things like printing out the return value, or not can cause the compiler to generate different code. 进行输出返回值之类的操作会导致编译器生成不同的代码。 Adding another variable and saving the result to that variable can kick the compiler to do something different. 添加另一个变量并将结果保存到该变量可以使编译器执行其他操作。 Try changing the compiler options, reduce or remove any optimization options, reduce or remove the debugger compiler options, etc. 尝试更改编译器选项，减少或删除任何优化选项，减少或删除调试器的编译器选项，等等。

Is this a proven system or are you developing on new hardware? 这是经过验证的系统，还是您正在开发新硬件？ Try running without any of the caches enabled for example. 例如，尝试在未启用任何缓存的情况下运行。 Working in a debugger and not in standalone, if not a compiler bug can be a timing issue, single stepping flushes the pipline, mixes the cache up differently, gives the cache and memory system an eternity to come up with a result which it doesnt have in real time. 在调试器中工作而不是独立工作，如果不是编译器错误可能是一个计时问题，则单步执行即可冲洗管线，以不同的方式混合缓存，使缓存和内存系统无穷无尽地得到结果实时。

In short there is a very long list of reasons why running under a debugger hides bugs that you cannot find until you test in the final deliverable like environment, I have only touched on a few. 简而言之，在调试器下运行为什么会隐藏直到在最终可交付结果之类的环境中进行测试之前才发现的错误的原因有很长的原因，我只涉及了一些。 Having it work in the debugger and not in standalone is not unexpected, it is simply how the tools work. 让它在调试器中而不是独立运行并不奇怪，这只是工具的工作方式。 It is likely your code, the hardware, or your tools based on the description you have given so far. 根据您到目前为止给出的描述，您的代码，硬件或工具很有可能会出现。

The fastest way to eliminate it being your code or the tools is to disassemble the section and inspect how the passed values and return values are handled. 消除它作为代码或工具的最快方法是反汇编该部分，并检查如何处理传递的值和返回值。 If the return value is optimized out there is your answer. 如果返回值得到优化，那么您会找到答案。

Are you compiling for a shared C library or static? 您要为共享的C库还是静态库进行编译？ Perhaps compile for static... 也许编译为静态...

如果独立运行，gdb下Linux上的C代码运行方式是否不同？

问题描述

2 个解决方案

解决方案1
6 已采纳 2010-03-01 03:34:17

解决方案2
2 2010-03-01 04:21:15

如果独立运行，gdb下Linux上的C代码运行方式是否不同？

问题描述

2 个解决方案

解决方案1 6 已采纳 2010-03-01 03:34:17

解决方案2 2 2010-03-01 04:21:15

解决方案1
6 已采纳 2010-03-01 03:34:17

解决方案2
2 2010-03-01 04:21:15