简体繁体 English

C / C ++程序中的损坏堆栈问题

[英]Corrupt stack problem in C/C++ program

原文 2011-04-14 18:40:21 9 5 c++/ c/ linux

I am running a C/C++ program in linux servers to serve videos. 我在linux服务器上运行一个C / C ++程序来提供视频服务。 The program's(say named Plugin) core functionality is to convert videos and we fork a separate Plugin process for each video request. 该程序（称为插件）的核心功能是转换视频，我们为每个视频请求分配一个单独的插件流程。 But I am having a weird problem for which sometimes server load average gets unexpectedly high. 但我有一个奇怪的问题，有时服务器负载平均值会出乎意料地高。 What I see from top command at this stage is that there are some processes which are running for long time and taking some huge CPU's. 我在这个阶段从top命令看到的是有一些进程长时间运行并占用了一些巨大的CPU。

When I debug this running program with gdb and backtrace stack,what I found is the corrupt stack: "Previous frame inner to this frame (corrupt stack?)". 当我使用gdb和backtrace堆栈调试这个正在运行的程序时，我发现的是损坏的堆栈：“此框架内部的前一帧（损坏的堆栈？）”。 I have searched the net and found that this occurs if the program gets segmentation fault. 我搜索过网，发现如果程序出现分段错误就会发生这种情况。

But what I know if the program gets segmentation fault, the program should crash and exit at that point. 但是我知道如果程序出现分段错误，程序应该崩溃并在那时退出。 But surprisingly the program still running after segmentation fault. 但令人惊讶的是，该程序仍然在分段故障后运行。

What can be the causes of this? 可能是什么原因造成的？ I know there must be some big problems in the program but I just can't understand from where to start fixing the problem...It would be great if any of you can show me some lights... 我知道程序中一定存在一些大问题，但我无法理解从哪里开始解决问题......如果你们中的任何人能给我看一些灯光那就太棒了......

Thanks in advance 提前致谢

5 个解决方案

Attaching the debugger changes the behavior of the process so you won't get reliable investigation results most probably. 附加调试器会更改进程的行为，因此您很可能无法获得可靠的调查结果。 Corrupted stack message from the debugger can mean that the particular debugger does not understand text info from the binary. 来自调试器的损坏的堆栈消息可能意味着特定的调试器不理解二进制文件中的文本信息。

I would recommend running pstack several time subsequently on the problematic (this is known as "Monte Carlo performance profiling") and also attach strace or truss to the problematic and check what system calls is the process doing when consuming CPU. 我建议随后在问题上运行pstack几次（这称为“蒙特卡罗性能分析”），并将strace或truss附加到有问题的位置，并检查消耗CPU时进程正在执行的系统调用。

在Valgrind下运行程序并修复它找到的任何无效内存写入。

某些优化（例如帧指针遗漏）会使调试器更难理解堆栈。

If you have the code, compile the program in debug and run Valgrind on it. 如果您有代码，请在debug中编译程序并在其上运行Valgrind。

If you don't have the code, contact the author/provider of the program. 如果您没有该代码，请与该程序的作者/提供者联系。

The corrupt stack message simply means the code is doing something weird with the memory. 损坏的堆栈消息只是意味着代码正在对内存做一些奇怪的事情。 It does not mean the program has a segmentation fault. 这并不意味着程序存在分段错误。 Also, the program can still run if it choose to handle the SIGSEGV signal. 此外，如果程序选择处理SIGSEGV信号，程序仍然可以运行。

If by forking you mean that you have some process which spawn and run other smaller processes, just monitor for such spikes and restart the process. 如果通过分叉意味着您有一些生成并运行其他较小进程的进程，则只需监视此类尖峰并重新启动该进程。 This assumes that you have no access to the fix the program. 这假设您无法访问修复程序。

通过汇编代码操作可能会对堆栈进行一些有趣的操作，例如真正的尾递归优化，自修改代码，非返回函数等，这些操作可能导致调试器无法正确回溯跟踪堆栈并导致它触发损坏的堆栈错误，但这并不一定意味着内存已损坏......但绝对是非传统的东西发生在引擎盖下。