简体   繁体   English

Linux上C ++中线程/共享内存之间的线程安全数据交换

[英]thread safe data exchange between threads/shared memory in C++ on linux

I got a "bit" confused: In production we have two processes communicating via shared memory, a part of data exchange is a long and a bool. 我对此感到困惑:在生产过程中,我们有两个进程通过共享内存进行通信,数据交换的一部分是漫长的,而且是一个bool。 The access to this data is not synchronized. 对此数据的访问不同步。 It's been working fine for a long time and still is. 它已经很好地工作了很长时间,现在仍然如此。 I know modifying a value is not atomic, but considering that these values are modified/accessed millions of times this had to fail? 我知道修改一个值不是原子的,但考虑到这些值被修改/访问数百万次,这不得不失败?

Here is a sample piece of code, which exchanges a number between two threads: 这是一段代码示例,它在两个线程之间交换一个数字:

#include <pthread.h>
#include <xmmintrin.h>

typedef unsigned long long uint64;
const uint64 ITERATIONS = 500LL * 1000LL * 1000LL;

//volatile uint64 s1 = 0;
//volatile uint64 s2 = 0;
uint64 s1 = 0;
uint64 s2 = 0;

void* run(void*)
{
    register uint64 value = s2;
    while (true)
    {
        while (value == s1)
        {
        _mm_pause();// busy spin
        }
        //value = __sync_add_and_fetch(&s2, 1);
        value = ++s2;
    }
 }

 int main (int argc, char *argv[])
 {
     pthread_t threads[1];
     pthread_create(&threads[0], NULL, run, NULL);

     register uint64 value = s1;
     while (s1 < ITERATIONS)
     {
         while (s2 != value)
         {
        _mm_pause();// busy spin
         }
        //value = __sync_add_and_fetch(&s1, 1);
        value = ++s1;
      }
}

as you can see I have commented out couple things: 正如你所看到的,我已经评论了几件事情:

//volatile uint64 s1 = 0; // volatile uint64 s1 = 0;

and

//value = __sync_add_and_fetch(&s1, 1); // value = __sync_add_and_fetch(&s1,1);

__sync_add_and_fetch atomically increments a variable. __sync_add_and_fetch以原子方式递增变量。

I know this is not very scientific, but running a few times without sync functions it works totally fine. 我知道这不是很科学,但运行几次没有同步功能它完全没问题。 Furthermore if I measure both versions sync and without sync they run at the same speed, how come __sync_add_and_fetch is not adding any overhead? 此外,如果我测量两个版本的同步和没有同步,它们以相同的速度运行,为什么__sync_add_and_fetch没有添加任何开销?

My guess is that compiler is guaranteeing atomicity for these operations and therefore I don't see a problem in production. 我的猜测是编译器保证了这些操作的原子性,因此我没有看到生产中的问题。 But still cannot explain why __sync_add_and_fetch is not adding any overhead (even running in debug). 但仍无法解释为什么__sync_add_and_fetch不会增加任何开销(甚至在调试中运行)。

Some more details about mine environment: ubuntu 10.04, gcc4.4.3 intel i5 multicore cpu. 关于矿山环境的更多细节:ubuntu 10.04,gcc4.4.3 intel i5 multicore cpu。

Production environment is similar it's just running on more powerful CPU's and on Centos OS. 生产环境类似于它在更强大的CPU和Centos OS上运行。

thanks for your help 谢谢你的帮助

Basically you're asking "why do I see no difference in behavior/performance between 基本上你在问“为什么我之间的行为/表现没有区别

s2++;

and

__sync_add_and_fetch(&s2, 1);

Well, if you go and look at the actual code generated by the compiler in these two cases, you will see that there IS a difference -- the s2++ version will have a simple INC instruction (or possibly an ADD), while the __sync version will have a LOCK prefix on that instruction. 那么,如果你去看看这两种情况下编译器生成的实际代码,你会发现存在差异 - s2++版本将有一个简单的INC指令(或可能是ADD),而__sync版本将在该指令上有一个LOCK前缀。

So why does it work without the LOCK prefix? 那么为什么没有LOCK前缀呢? Well, while in general, the LOCK prefix is required for this to work on ANY x86-based system, it turns out its not needed for yours. 好吧,虽然通常情况下需要LOCK前缀才能在任何基于x86的系统上运行,但事实证明它不需要它。 With Intel Core based chips, the LOCK is only needed to synchronize between different CPUs over the bus. 使用基于Intel Core的芯片,LOCK仅需要通过总线在不同CPU之间进行同步。 When running on a single CPU (even with multiple cores), it does its internal synchronization without it. 在单个CPU(即使有多个内核)上运行时,它会在没有它的情况下进行内部同步。

So why do you see no slowdown in the __sync case? 那你为什么看不到__sync案例的减速? Well, a Core i7 is a 'limited' chip in that it only supports single socket systems, so you can't have multiple CPUs. 那么,Core i7是一个“有限”的芯片,因为它只支持单插槽系统,所以你不能拥有多个CPU。 Which means the LOCK is never needed and in fact the CPU just ignores it completely. 这意味着永远不需要LOCK,事实上CPU完全忽略了它。 Now the code is 1 byte larger, which means it could have an impact if you were ifetch or decode limited, but you're not, so you see no difference. 现在代码大1个字节,这意味着如果ifetch或decode有限,它会产生影响,但你不是,所以你看不出任何区别。

If you were to run on a multi-socket Xeon system, you would see a (small) slowdown for the LOCK prefix, and could also see (rare) failures in the non-LOCK version. 如果您要在多插槽Xeon系统上运行,您会看到LOCK前缀的(小)减速,并且还可以在非LOCK版本中看到(罕见)故障。

I think compiler generates no atomicity until you use some compiler-specific patterns, so thats a no-go. 我认为编译器在使用某些特定于编译器的模式之前不会产生原子性,所以这是不行的。

If only two processes are using the shared memory, usually no problems would occur, Specially if code snippets are short enough. 如果只有两个进程正在使用共享内存,通常不会出现任何问题,特别是如果代码片段足够短。 Operating system prefers to block one process and run another when its best (eg I/O), So it will run one to a good point of isolation, then switch to the next. 操作系统更喜欢阻塞一个进程并在其最佳时运行另一个进程(例如I / O),因此它将运行一个到一个良好的隔离点,然后切换到下一个进程。

Try running a few instances of the same application and see what happens. 尝试运行相同应用程序的几个实例,看看会发生什么。

I see you're using Martin Thompson inter-thread-latency example. 我看到你正在使用Martin Thompson的线程间延迟示例。

My guess is that compiler is guaranteeing atomicity for these operations and therefore I don't see a problem in production. 我的猜测是编译器保证了这些操作的原子性,因此我没有看到生产中的问题。 But still cannot explain why __sync_add_and_fetch is not adding any overhead (even running in debug). 但仍无法解释为什么__sync_add_and_fetch不会增加任何开销(甚至在调试中运行)。

The compiler doesn't guarentee anything here. 编译器不保证这里的任何内容。 The X86 platform you're running on is. 你正在运行的X86平台是。 This code will probably fail on funky hardware. 这段代码可能会在时髦的硬件上失败。

Not sure what you're doing, but C++11 does provide atomicity with std::atomic. 不确定你在做什么,但C ++ 11确实提供了std :: atomic的原子性。 You can also have a look at boost::atomic . 你也可以看一下boost :: atomic I assume you're interested in the Disruptor pattern, I'll shamelessly plug my port to C++ called disruptor-- . 我假设你对Disruptor模式感兴趣,我会无耻地将我的端口插入C ++,称为disruptor--

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM