简体繁体 English

在Mac OS X上的pthread_specific（）中崩溃

[英]Crash in pthread_specific() on Mac OS X

原文 2013-03-23 19:02:58 3 1 macos/ crash/ pthreads/ indy/ fpc

I'm getting a crash in pthread_specific() on OS X Lion using a 32-bit server application written with FPC and Indy 10 on Mac OS X. I'm finding it very hard to track down the cause. 我在OS X Lion上的pthread_specific（）中崩溃，使用的是在Mac OS X上用FPC和Indy 10编写的32位服务器应用程序。我很难找到原因。 The crash occurs because gs:[tlsindex] is not readable, but I have no idea why this occurs. 发生崩溃是因为gs：[tlsindex]不可读，但我不知道为什么会发生这种情况。 The tlsindex is correct, so the descriptor table must somehow have become corrupted. tlsindex是正确的，因此描述符表必须以某种方式损坏。

Is there a way to print the descriptor table using gdb / Xcode 4 on OS X? 有没有办法在OS X上使用gdb / Xcode 4打印描述符表？ I'm thinking that if I know the address in memory, I could set a data breakpoint on it and hopefully break at the code that corrupts the descriptor table. 我在想，如果我知道内存中的地址，则可以在其上设置数据断点，并希望中断破坏描述符表的代码。 Unfortunately I can't find any information on how TLS is actually implemented on OS X (i386). 不幸的是，我找不到有关如何在OS X（i386）上实际实现TLS的任何信息。

Or perhaps someone has a brilliant idea on how to tackle this problem? 也许有人对如何解决这个问题有一个绝妙的主意？

1 个解决方案

I'll answer my own question in case this is ever useful for someone else. 如果这对其他人有用，我将回答我自己的问题。 OS X sets up gs to point to the TLS storage for the current thread. OS X设置gs指向当前线程的TLS存储。 This is actually a part of the thread's data block ( struct _pthread ), as can be found out by reading Darwin source code: http://www.opensource.apple.com/source/Libc/Libc-391/pthreads/pthread_internals.h 这实际上是线程数据块（ struct _pthread ）的一部分，可以通过阅读Darwin源代码找到： http : //www.opensource.apple.com/source/Libc/Libc-391/pthreads/pthread_internals。 H

It's easy to retrieve a pointer to this data block: pthread_self will return it. 检索指向该数据块的指针很容易： pthread_self将返回它。 By logging this, I found out that the data block was most likely freed by someone else while the thread is still executing. 通过记录此内容，我发现线程仍在执行时，数据块最有可能被其他人释放了。 By trapping vm_deallocate using mach_override , I found out that this was done by the cleanup code for another thread. 通过使用mach_override捕获vm_deallocate ，我发现这是由另一个线程的清理代码完成的。

Eventually it turned out that I was calling pthread_join on a thread that was already detached via pthread_detach . 最终事实证明，我打电话pthread_join ，在一个已经通过分离线程pthread_detach 。 Both functions will free the thread storage. 这两个函数都会释放线程存储空间。 After the thread had been detached (but before the erroneous join), another thread was created with the exact same base address by chance. 分离线程后（但在错误连接之前），偶然创建了另一个线程，该线程具有完全相同的基址。 The join would free the new thread, leaving it to execute without its data block. 联接将释放新线程，使其在没有数据块的情况下执行。 This bug was caused by the different behavior of the pthread library compared to Windows, where waiting on a thread (join) and closing it (detach) are two completely different things. 与Windows相比，此错误是由pthread库的不同行为引起的，在Windows中，等待线程（联接）和关闭线程（分离）是两个完全不同的事情。