简体繁体 English

如何检测内存访问冲突和/或内存争用情况？

[英]How do I detect memory access violation and/or memory race conditions?

原文 2010-08-17 15:08:48 7 5 c++/ algorithm

I have a target platform reporting when memory is read from or written to as well as when locks(think mutex for example) are taken/freed. 我有一个目标平台报告何时读取或写入内存以及何时获取/释放锁（例如，认为互斥锁）。 It reports the program counter, data address and read/write flag. 它报告程序计数器，数据地址和读/写标志。 I am writing a program to use this information on a separate host machine where the reports are received so it does not interfere with the target. 我正在编写一个程序在接收报告的单独主机上使用此信息，以免干扰目标。 The target already reports this data so I am not changing the target code at all. 目标已经报告了该数据，因此我根本不会更改目标代码。

Are there any references or already available algorithms that do this kind of detection? 是否有参考文献或已有的算法可以进行这种检测？ For example, some way of detecting race conditions when multiple threads try to write to a global variable without protecting it first. 例如，当多个线程试图写入全局变量而不首先保护它时，一种检测竞争条件的方法。

I am currently brewing my own but I convince myself there is definitely some code out there that does this already. 我目前正在酝酿自己的计划，但我确信自己肯定已经有一些代码可以做到这一点。 Or at least some proven algorithm of how to go about it. 或者至少是一些经过验证的算法。

Note This is not to detect memory leaks. 注意这不是检测内存泄漏。

Note Implementation language is C++ 注意实现语言是C ++

I am trying to make the detection code I write platform agnostic so I am using STL and just Standard C++ with libraries like boost, poco, loki. 我试图使检测代码与平台无关，因此我使用的是STL，而仅使用带有Boost，poco，loki之类的库的Standard C ++。

Any leads will help 任何线索都会有所帮助

thanks. 谢谢。

5 个解决方案

It is probably too late to talk you out of this, but this does not work. 让您摆脱困境可能为时已晚，但这是行不通的。 Threading races are caused by subtle timing issues between threads. 线程争用是由线程之间的细微时序问题引起的。 You can never diagnose timing related problems with logging. 您永远无法通过日志记录来诊断与计时相关的问题。 Heisenbergian, just logging alters the timing of a thread. Heisenbergian，只是记录会更改线程的时间。 Especially the kind you are contemplating. 特别是您正在考虑的那种。 Infamously, there's plenty of software that shipped with logging kept turned on because it would nosedive with it turned off. 臭名昭著的是，有很多附带日志记录的软件一直处于打开状态，因为它会随着关闭状态而下降。

Flushing out threading bugs is hard. 清除线程错误很难。 The kind of tool that works is one that intentionally injects random delays in code. 有效的工具是有意在代码中插入随机延迟的工具。 Microsoft CHESS is an example, works on native code too. Microsoft CHESS是一个示例，也可以在本机代码上运行。

To address only part of your question, race conditions are extremely nasty precisely because there is no good way to test for them. 仅解决您的问题的一部分，比赛条件极其恶劣，因为没有很好的方法来测试它们。 By definition they're unpredictable sequences of events that are quite difficult to diagnose. 根据定义，它们是难以预测的事件序列，很难诊断。 Detection code depends on the fact that the race condition is actually happening, and in that case it's likely that you'll see errant behavior anyway. 检测代码取决于种族状况实际上正在发生的事实，在这种情况下，无论如何，您很可能会看到错误的行为。 Any test code you add may make them more or less likely to appear, or possibly even change the timing such that they never appear at all. 您添加的任何测试代码都可能使它们或多或少地出现，甚至可能更改时间以使它们根本不会出现。

Instead of trying to detect race conditions, what about attempting program design that helps make you more resilient to having them in the first place? 与其尝试检测竞争状况，不如尝试通过程序设计来帮助您从一开始就更加灵活地进行竞争？

For example if your global variable were simply encapsulated in an object that knows all the proper protection that needs to happen on access, then it's impossible for threads to concurrently write to it, because such a interface doesn't exist. 例如，如果您将全局变量简单地封装在一个对象中，该对象知道在访问时需要进行的所有适当保护，则线程不可能同时写入该对象，因为这样的接口不存在。 Programmatically preventing race conditions is going to be easier than trying to detect them algorithmically (chances are you'll still catch some during unit/subsystem testing). 与尝试通过算法检测竞态条件相比，以编程方式防止竞态条件要容易得多（可能是在单元/子系统测试期间您仍然会发现某些情况）。

There is no standard way, since the C/C++ standards do not deal with OS specific concepts like memory protection. 没有标准的方法，因为C / C ++标准没有处理特定于操作系统的概念，例如内存保护。 Have a look at Breakpad , the crash reporting library used by Mozilla on various platforms like OS X, Win32 or Linux. 看一下Breakpad ，Mozilla在OS X，Win32或Linux等各种平台上使用的崩溃报告库。

Check out this article by Andrei Alexandrescu: http://www.drdobbs.com/184403766;jsessionid=LKUUBKFR00O0VQE1GHRSKH4ATMY32JVN 请查看Andrei Alexandrescu的这篇文章： http : //www.drdobbs.com/184403766 ;jsessionid= LKUUBKFR00O0VQE1GHRSKH4ATMY32JVN

It advocates using the volatile keyword on your data that is accessed by more than one thread. 它主张在由多个线程访问的数据上使用volatile关键字。 If you cast away that volatility with your locking mechanism, you will know via compiler error where you need to lock that data. 如果您通过锁定机制消除了这种波动性，则将通过编译器错误知道需要锁定该数据的位置。

I have used this method and found it extremely helpful. 我使用了这种方法，发现它非常有用。

Hope that helps. 希望能有所帮助。

如果您可以在Valgrind下运行您的应用程序，则它包括一个名为Helgrind的工具，旨在检测以下种族： http ://valgrind.org/docs/manual/hg-manual.html