C ++ libpthread程序segfaults（原因不明）

Question

I have a libpthread linked application. 我有一个libpthread链接的应用程序。 The core of the application are two FIFOs shared by four threads ( two threads per one FIFO that is ;). 该应用程序的核心是由四个线程共享的两个FIFO（每个FIFO有两个线程；即；）。 The FIFO class is synchronized using pthread mutexes and it stores pointers to big classes ( containing buffers of about 4kb size ) allocated inside static memory using overloaded new and delete operators ( no dynamic allocation here ). FIFO类使用pthread互斥锁进行同步，并且使用重载的new和delete运算符（此处没有动态分配）存储指向静态存储器内分配的大类（包含约4kb大小的缓冲区）的指针。

The program itself usually works fine, but from time to time it segfaults for no visible reason. 该程序本身通常可以正常运行，但是有时会出现段错误，没有明显的原因。 The problem is, that I can't debug the segfaults properly as I'm working on an embedded system with an old linux kernel (2.4.29) and g++ (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)). 问题是，当我在具有旧版Linux内核（2.4.29）和g ++（gcc版本egcs-2.91.66 19990314 / Linux（egcs-1.1.x）的嵌入式系统上工作时，无法正确调试segfaults。 2版））。

There's no gdb on the system, and I can't run the application elsewhere ( it's too hardware specific ). 系统上没有gdb，而且我无法在其他地方运行该应用程序（它也是特定于硬件的）。

I compiled the application with -g and -rdynamic flags, but an external gdb tells me nothing when I examine the core file ( only hex addresses ) - still I can print the backtrace from the program after catching SIGSEGV - it always looks like this: 我使用-g和-rdynamic标志编译了应用程序，但是当我检查核心文件（仅十六进制地址）时，外部gdb不会告诉我任何信息-仍然可以在捕获SIGSEGV之后从程序中打印回溯-看起来总是这样：

Backtrace for process with pid: 6279
-========================================-
[0x8065707]
[0x806557a]
/lib/libc.so.6(sigaction+0x268) [0x400bfc68]
[0x8067bb9]
[0x8067b72]
[0x8067b25]
[0x8068429]
[0x8056cd4]
/lib/libpthread.so.0(pthread_detach+0x515) [0x40093b85]
/lib/libc.so.6(__clone+0x3a) [0x4015316a]
-========================================-
End of backtrace

So it seems to be pointing to libpthread... 所以它似乎指向libpthread ...

I ran some of the modules through valgrind, but I didn't find any memory leaks (as I'm barely using any dynamic allocation ). 我通过valgrind运行了一些模块，但是我没有发现任何内存泄漏（因为我几乎没有使用任何动态分配）。

I thought that maybe the mutexes are causing some trouble ( as they are being locked/unlocked about 200 times a second ) so I switched my simple mutex class: 我认为互斥锁可能会引起一些麻烦（因为它们每秒被锁定/解锁大约200次），所以我切换了简单的互斥锁类：

class AGMutex {

    public:

        AGMutex( void ) {
            pthread_mutex_init( &mutex1, NULL );
        }

        ~AGMutex( void ) {
            pthread_mutex_destroy( &mutex1 );
        }

        void lock( void ) {
            pthread_mutex_lock( &mutex1 );
        }

        void unlock( void ) {
            pthread_mutex_unlock( &mutex1 );
        }

    private:

        pthread_mutex_t mutex1;

};

to a dummy mutex class: 到虚拟互斥体类：

class AGMutex {

    public:

        AGMutex( void ) : mutex1( false ) {
        }

        ~AGMutex( void ) {
        }

        volatile void lock( void ) {
            if ( mutex1 ) {
                while ( mutex1 ) {
                    usleep( 1 );
                }
            }
            mutex1 = true;
        }

        volatile void unlock( void ) {
            mutex1 = false;
        }

    private:

        volatile bool mutex1;

};

but it changed nothing and the backtrace looks the same... 但是它什么都没改变，回溯看起来也一样……

After some oldchool put-cout-between-every-line-and-see-where-it-segfaults-plus-remember-the-pids-and-stuff debugging session it seems that it segfaults during usleep (?). 在每条线之间进行一些古老的选择之后，看到它出现段错误并记住pids和东西的调试会话，似乎它在休眠状态下出现段错误（？）。

I have no idea what else could be wrong. 我不知道还有什么可能是错的。 It can work for an hour or so, and then suddenly segfault for no apparent reason. 它可以工作一个小时左右，然后突然在没有明显原因的情况下出现段错误。

Has anybody ever encountered a similar problem? 有人遇到过类似的问题吗？

Answer 1

From my answer to How to generate a stacktrace when my gcc C++ app crashes : 从我对我的gcc C ++应用程序崩溃时如何生成stacktrace的 回答：

The first two entries in the stack frame chain when you get into the 
    signal handler contain a return address inside the signal handler and
    one inside sigaction() in libc.  The stack frame of the last function
    called before the signal (which is the location of the fault) is lost.

This may explain why you are having difficulties determining the location of your segfault via a backtrace from a signal handler. 这可以解释为什么您难以通过信号处理程序的回溯来确定段故障的位置。 My answer also includes a workaround for this limitation. 我的答案还包括针对此限制的解决方法。

If you want to see how your application actually is laid out in memory (ie 0x80..... addresses), you should be able to generate a map file from gcc. 如果您想查看应用程序在内存中的实际布局（即0x80.....地址），则应该能够从gcc生成映射文件。 This typically done via -Wl,-Map,output.map , which passes -Map output.map to the linker. 这通常是通过-Wl,-Map,output.map ，该方法将-Map output.map传递给链接器。

You may also have a hardware-specific version of objdump or nm with your toolchain/cross-toolchain that may be helpful in deciphering your 0x80..... addresses. 您的工具链/跨工具链也可能具有objdump或nm的特定于硬件的版本，这可能有助于破译0x80.....地址。

Answer 2

Do you have access to Helgrind on your platform? 您可以在平台上访问Helgrind吗？ It's a Valgrind tool for detecting POSIX thread errors such as races and threads holding mutexes when they exit. 这是一个Valgrind工具，用于检测POSIX线程错误，例如种族和线程在退出时持有互斥体。

C ++ libpthread程序segfaults（原因不明）

问题描述

2 个解决方案

解决方案1
1 2010-02-22 17:07:02

解决方案2
0 2010-02-26 16:10:02

C ++ libpthread程序segfaults（原因不明）

问题描述

2 个解决方案

解决方案1 1 2010-02-22 17:07:02

解决方案2 0 2010-02-26 16:10:02

解决方案1
1 2010-02-22 17:07:02

解决方案2
0 2010-02-26 16:10:02