简体繁体 English

C编程：使用pthreads进行调试

[英]C Programming: Debugging with pthreads

原文 2009-06-11 13:18:05 1 8 c/ multithreading/ debugging/ pthreads

One of the hardest things for me to initially adjust to was my first intense experience programming with pthreads in C. I was used to knowing exactly what the next line of code to be run would be and most of my debugging techniques centered around that expectation. 我最初调整到的最困难的事情之一是我第一次使用C语言编写C语言的强烈经验。我习惯于确切地知道下一行代码将会是什么，并且我的大多数调试技术都围绕着这种期望。

What are some good techniques to debugging with pthreads in C? 在C中使用pthread进行调试有哪些好方法？ You can suggest personal methodologies without any added tools, tools you use, or anything else that helps you debug. 您可以在没有任何附加工具，使用的工具或其他任何可以帮助您调试的方法的情况下建议个人方法。

PS I do my C programming using gcc in linux, but don't let that necessarily restrain your answer PS我在linux中使用gcc进行C编程，但不要让它限制你的答案

8 个解决方案

Valgrind is an excellent tool to find race conditions and pthreads API misuses. Valgrind是查找竞争条件和pthreads API误用的绝佳工具。 It keeps a model of program memory (and perhaps of shared resources) accesses and will detect missing locks even when the bug is benign (which of course means that it will completely unexpectedly become less benign at some later point). 它保留了程序存储器（可能还有共享资源）访问的模型，并且即使在bug是良性的时候也会检测丢失的锁（这当然意味着它将在以后的某个时刻完全出乎意料地变得不那么温和）。

To use it, you invoke valgrind --tool=helgrind , here is its manual . 要使用它，你调用valgrind --tool=helgrind ，这是它的手册。 Also, there is valgrind --tool=drd ( manual ). 还有valgrind --tool=drd （手动）。 Helgrind and DRD use different models so they detect overlapping but possibly different set of bugs. Helgrind和DRD使用不同的模型，因此它们可以检测重叠但可能不同的错误集。 False positives also may occur. 也可能出现误报。

Anyway, valgrind has saved countless hours of debugging (not all of them though :) for me. 无论如何，valgrind已经为我节省了无数个小时的调试（虽然不是全部）。

One of the things that will suprise you about debugging threaded programs is that you will often find the bug changes, or even goes away when you add printf's or run the program in the debugger (colloquially known as a Heisenbug ). 调试线程程序会使您感到困惑的一件事是，您经常会发现错误更改，甚至在添加printf或在调试程序中运行程序时会消失（通常称为Heisenbug ）。

In a threaded program, a Heisenbug usually means you have a race condition . 在线程程序中，Heisenbug通常意味着您有竞争条件。 A good programmer will look for shared variables or resources that are order-dependent. 一个优秀的程序员将寻找与顺序相关的共享变量或资源。 A crappy programmer will try to blindly fix it with sleep() statements. 一个蹩脚的程序员会尝试使用sleep（）语句盲目修复它。

In the 'thinking' phase, before you start coding, use the State Machine concept. 在“思考”阶段，在开始编码之前，请使用State Machine概念。 It can make the design much clearer. 它可以使设计更清晰。

printf's can help you understand the dynamics of your program. printf可以帮助您了解程序的动态。 But they clutter up the source code, so use a macro DEBUG_OUT() and in its definition enable it with a boolean flag. 但它们使源代码混乱，因此使用宏DEBUG_OUT（）并在其定义中使用布尔标志启用它。 Better still, set/clear this flag with a signal that you send via 'kill -USR1'. 更好的是，使用通过'kill -USR1'发送的信号设置/清除此标志。 Send the output to a log file with a timestamp. 将输出发送到带有时间戳的日志文件。

also consider using assert(), and then analyze your core dumps using gdb and ddd. 还要考虑使用assert（），然后使用gdb和ddd分析核心转储。

Debugging a multithreaded application is difficult. 调试多线程应用程序很困难。 A good debugger such as GDB (with optional DDD front end) for the *nix environment or the one that comes with Visual Studio on windows will help tremendously. 一个好的调试器，如用于* nix环境的GDB （带有可选的DDD前端）或Windows上的Visual Studio附带的调试器将有很大帮助。

I pretty much develop in an exclusively multi-threaded, high performance world so here's the general practice I use. 我几乎在一个独特的多线程，高性能世界中发展，所以这是我使用的一般做法。

Design- the best optimization is a better algorithm: 设计 - 最佳优化是一种更好的算法：

1) Break you functions into LOGICALLY separable pieces. 1）将功能分解为LOGICALLY可分离的部分。 This means that a call does "A" and ONLY "A"- not A then B then C... 这意味着一个呼叫做“A”而只有“A” - 不是A然后B然后C ......
2) NO SIDE EFFECTS: Abolish all nakedly global variables, static or not. 2）无副作用：取消所有裸露的全局变量，静态或不静态。 If you cannot fully abolish side effects, isolate them to a few locations (concentrate them in the code). 如果您无法完全消除副作用，请将它们隔离到几个位置（将它们集中在代码中）。
3) Make as many isolated components RE-ENTRANT as possible. 3）尽可能多地制作隔离组件RE-ENTRANT。 This means they're stateless- they take all their inputs as constants and only manipulate DECLARED, logically constant parameters to produce the output. 这意味着它们是无状态的 - 它们将所有输入作为常量，并且只操作DECLARED，逻辑上恒定的参数来产生输出。 Pass-by-value instead of reference wherever you can. 无论您在哪里，都可以通过值传递而不是参考。
4) If you have state, make a clear separation between stateless sub-assemblies and the actual state machine. 4）如果你有状态，在无状态子组件和实际状态机之间做一个明确的分离。 Ideally the state machine will be a single function or class manipulating stateless components. 理想情况下，状态机将是一个操作无状态组件的单个函数或类。

Debugging: 调试：

Threading bugs tend to come in 2 broad flavors- races and deadlocks. 线程错误往往有两种广泛的种族和僵局。 As a rule, deadlocks are much more deterministic. 通常，死锁更具确定性。

1) Do you see data corruption?: YES => Probably a race. 1）您是否看到数据损坏？：是=>可能是一场比赛。
2) Does the bug arise on EVERY run or just some runs?: YES => Likely a deadlock (races are generally non-deterministic). 2）每次运行或仅运行一次都会出现错误吗？：是=>可能是死锁（种族通常是非确定性的）。
3) Does the process ever hang?: YES => There's a deadlock somewhere. 3）进程是否挂起？：YES =>某处出现死锁。 If it only hangs sometimes, you probably have a race too. 如果它有时只挂起，你可能也会参加比赛。

Breakpoints often act much like synchronization primitives THEMSELVES in the code, because they're logically similar- they force execution to stall in the current context until some other context (you) sends a signal to resume. 断点通常与代码中的同步原语THEMSELVES非常相似，因为它们在逻辑上相似 - 它们强制执行在当前上下文中停止，直到某些其他上下文（您）发送信号以恢复。 This means that you should view any breakpoints you have in code as altering its mufti-threaded behavior, and breakpoints WILL affect race conditions but (in general) not deadlocks. 这意味着您应该在代码中查看任何断点，以改变其多线程行为，并且断点将影响竞争条件，但（通常）不会出现死锁。

As a rule, this means you should remove all breakpoints, identify the type of bug, THEN reintroduce them to try and fix it. 通常，这意味着您应该删除所有断点，识别错误类型，然后重新引入它们以尝试修复它。 Otherwise, they simply distort things even more. 否则，他们只会扭曲事物。

My approach to multi-threaded debugging is similar to single-threaded, but more time is usually spent in the thinking phase: 我对多线程调试的方法类似于单线程，但通常在思考阶段花费的时间更多：

Develop a theory as to what could be causing the problem. 制定一个关于可能导致问题的理论。
Determine what kind of results could be expected if the theory is true. 如果理论是真的，确定可以预期什么样的结果。
If necessary, add code that can disprove or verify your results and theory. 如有必要，添加可能反驳或验证结果和理论的代码。
If your theory is true, fix the problem. 如果您的理论是正确的，请解决问题。

Often, the 'experiment' that proves the theory is the addition of a critical section or mutex around suspect code. 通常，证明该理论的“实验”是围绕可疑代码添加关键部分或互斥体。 I will then try to narrow down the problem by systematically shrinking the critical section. 然后，我将尝试通过系统地缩小关键部分来缩小问题范围。 Critical sections are not always the best fix (though can often be the quick fix). 关键部分并不总是最好的解决方案（尽管通常可以快速修复）。 However, they're useful for pinpointing the 'smoking gun'. 但是，它们对于精确定位“吸烟枪”非常有用。

Like I said, the same steps apply to single-threaded debugging, though it is far too easy to just jump into a debugger and have at it. 就像我说的那样，相同的步骤适用于单线程调试，尽管只是跳进调试器并且很容易。 Multi-threaded debugging requires a much stronger understanding of the code, as I usually find the running multi-threaded code through a debugger doesn't yield anything useful. 多线程调试需要对代码有更强的理解，因为我通常发现通过调试器运行的多线程代码不会产生任何有用的东西。

Also, hellgrind is a great tool. 而且，hellgrind是一个很棒的工具。 Intel's Thread Checker performs a similar function for Windows, but costs a lot more than hellgrind. 英特尔的线程检查器为Windows执行类似的功能，但成本远远超过了他的成本。

When I started doing multithreaded programming I... stopped using debuggers. 当我开始进行多线程编程时，我......停止使用调试器。 For me the key point is good program decomposition and encapsulation. 对我来说，关键是良好的程序分解和封装。

Monitors are the easiest way of error-free multithreaded programming. 监视器是无差错多线程编程的最简单方法。 If you cannot avoid complex lock dependencies then it is easy to check if they are cyclic - wait until program hangs ans check the stacktraces using 'pstack'. 如果你无法避免复杂的锁依赖，那么很容易检查它们是否是循环的 - 等到程序挂起并使用'pstack'检查堆栈跟踪。 You can break cyclic locks by introducing some new threads and asynchronous communication buffers. 您可以通过引入一些新线程和异步通信缓冲区来中断循环锁定。

Use assertions, and make sure to write singlethreaded unittests for particular components of your software - you can then run them in debugger if you want. 使用断言，并确保为软件的特定组件编写单线程单元测试 - 如果需要，可以在调试器中运行它们。

I tend to use lots of breakpoints. 我倾向于使用大量断点。 If you don't actually care about the thread function, but do care about it's side effects a good time to check them might be right before it exits or loops back to it's waiting state or whatever else it's doing. 如果你实际上并不关心线程函数，但确实关心它的副作用，那么检查它们的好时机可能会在它退出或循环回到它的等待状态或其他任何其他状态之前。