简体   繁体   English

使用匿名管道是否会为线程间通信引入内存障碍?

[英]Does the use of an anonymous pipe introduce a memory barrier for interthread communication?

For example, say I allocate a struct with new and write the pointer into the write end of an anonymous pipe. 例如,假设我分配了一个带有new的结构,并将指针写入匿名管道的写入端。

If I read the pointer from the corresponding read end, am I guaranteed to see the 'correct' contents on the struct? 如果我从相应的读取端读取了指针,是否可以确保在结构上看到“正确”的内容?

Also of of interest is whether the results of socketpair() on unix & self connecting over tcp loopback on windows have the same guarantees. 同样有趣的是,unix上的socketpair()和Windows上通过tcp环回进行自我连接的结果是否具有相同的保证。

The context is a server design which centralizes event dispatch with select/epoll 上下文是一种服务器设计,可通过select / epoll集中进行事件分发

For example, say I allocate a struct with new and write the pointer into the write end of an anonymous pipe. 例如,假设我分配了一个带有new的结构,并将指针写入匿名管道的写入端。

If I read the pointer from the corresponding read end, am I guaranteed to see the 'correct' contents on the struct? 如果我从相应的读取端读取了指针,是否可以确保在结构上看到“正确”的内容?

No. There is no guarantee that the writing CPU will have flushed the write out of its cache and made it visible to the other CPU that might do the read. 不能。不能保证写入CPU会将写入内容从其缓存中清除掉,并使它对可能进行读取的其他CPU可见。

Also of of interest is whether the results of socketpair() on unix & self connecting over tcp loopback on windows have the same guarantees. 同样有趣的是,unix上的socketpair()和Windows上通过tcp环回进行自我连接的结果是否具有相同的保证。

No. 没有。

In practice, calling write() , which is a system call, will end up locking one or more data structures in the kernel, which should take care of the reordering issue. 在实践中,调用write() (这是系统调用)将最终锁定内核中的一个或多个数据结构,这应解决重新排序问题。 For example, POSIX requires subsequent reads to see data written before their call, which implies a lock (or some kind of acquire/release) by itself. 例如,POSIX需要后续读取才能看到在调用之前写入的数据,这本身就意味着有锁定(或某种获取/释放)。

As for whether that's part of the formal spec of the calls, probably it's not. 至于这是否属于电话正式说明的一部分,可能不是。

A pointer is just a memory address, so provided you are on the same process the pointer will be valid on the receiving thread and will point to the same struct. 指针只是一个内存地址,因此, 如果您在同一进程上,则该指针将在接收线程上有效,并且指向同一结构。 If you are on different processes, at best you will get immediately a memory error, at worse you will read (or write) to a random memory which is essentially Undefined Behaviour. 如果您使用的是不同的进程,那么充其量您将立即获得一个内存错误,更糟糕的是,您将读取(或写入)一个本质上是未定义行为的随机内存。

Will you read the correct content? 您会阅读正确的内容吗? Neither better nor worse than if your pointer was in a static variable shared by both threads: you still have to do some synchronization if you want consistency. 无论指针是在两个线程共享的静态变量中,都没有比这更好或更坏的了:如果要保持一致性,您仍然必须进行一些同步

Will the kind of transfer address matter between static memory (shared by threads), anonymous pipes, socket pairs, tcp loopback, etc.? 静态内存(由线程共享),匿名管道,套接字对,tcp环回等之间的传输地址类型是否重要? No: all those channels transfers bytes , so if you pass a memory address, you will get your memory address. 否:所有这些通道都传输字节 ,因此,如果您传递一个内存地址,则将获得您的内存地址。 What is left you then is synchronization, because here you are just sharing a memory address. 然后剩下的就是同步,因为在这里您只是共享一个内存地址。

If you do not use any other synchronization, anything can happen (did I already spoke of Undefined Behaviour?): 如果您不使用任何其他同步,则可能发生任何事情(我是否已经提到过未定义的行为?):

  • reading thread can access memory before it has been written by writing one giving stale data 读取线程可以通过写入一个过时的数据来在写入之前访问内存
  • if you forgot to declare the struct members as volatile, reading thread can keep using cached values, here again getting stale data 如果您忘记将struct成员声明为volatile,则读取线程可以继续使用缓存的值,从而再次获取陈旧的数据
  • reading thread can read partially written data meaning incoherent data 读取线程可以读取部分写入的数据,这意味着数据不一致

Interesting question with, so far, only one correct answer from Cornstalks. 到目前为止,有趣的问题是Cornstalks的一个正确答案。

Within the same (multi-threaded) process there are no guarantees since pointer and data follow different paths to reach their destination. 在同一(多线程)进程中,由于指针和数据遵循不同的路径到达目的地,因此无法保证。 Implicit acquire/release guarantees do not apply since the struct data cannot piggyback on the pointer through the cache and formally you are dealing with a data race. 隐式获取/释放保证不适用,因为结构数据无法通过缓存搭载在指针上,并且您正在正式处理数据争用。

However, looking at how the pointer and the struct data itself reach the second thread (through the pipe and memory cache respectively), there is a real chance that this mechanism is not going to cause any harm. 但是,查看指针和结构数据本身如何到达第二个线程(分别通过管道和内存高速缓存),则很有可能该机制不会造成任何危害。 Sending the pointer to a peer thread takes 3 system calls ( write() in the sending thread, select() and read() in the receiving thread) which is (relatively) expensive and by the time the pointer value is available in the receiving thread, the struct data probably has arrived long before. 将指针发送到对等线程需要3个系统调用(发送线程中的write() ,接收线程中的select()read() ),这(相对)昂贵,并且在接收时指针值可用时线程,结构数据可能早已到达。

Note that this is just an observation, the mechanism is still incorrect. 请注意,这只是一个观察,机制仍然不正确。

I believe, your case might be reduced to this 2 threads model: 我相信,您的情况可能会简化为以下2个线程模型:

int data = 0;
std::atomic<int*> atomicPtr{nullptr};
//...

void thread1()
{
    data = 42;
    atomicPtr.store(&integer, std::memory_order_release);
}

void thread2()
{
    int* ptr = nullptr;
    while(!ptr)
        ptr = atomicPtr.load(std::memory_order_consume);
    assert(*ptr == 42);
}

Since you have 2 processes you can't use one atomic variable across them but since you listed you can omit atomicPtr.load(std::memory_order_consume) from the consuming part because, AFAIK, all the architectures Windows is running on guarantee this load to be correct without any barrier on the loading side. 由于您有2个进程,因此不能在它们之间使用一个原子变量,但是由于您列出了 ,因此可以从消耗部分中忽略atomicPtr.load(std::memory_order_consume) ,因为AFAIK,Windows运行的所有体系结构都可以保证此负载是正确的,在装载侧没有任何障碍。 In fact, I think there are not much architectures out there where that instruction would not be a NO-OP(I heard only about DEC Alpha) 实际上,我认为那里没有太多的架构可以使该指令成为NO-OP(我只听说过DEC Alpha)

I agree with Serge Ballesta's answer. 我同意Serge Ballesta的回答。 Within the same process, it's feasible to send and receive object address via anonymous pipe. 在同一过程中,通过匿名管道发送和接收对象地址是可行的。

Since the write system call is guaranteed to be atomic when message size is below PIPE_BUF (normally 4096 bytes), so multi-producer threads will not mess up each other's object address (8 bytes for 64 bit applications). 由于在消息大小小于PIPE_BUF (通常为4096个字节)时,保证write系统调用是原子的,因此多生产者线程不会弄乱彼此的对象地址(对于64位应用程序为8个字节)。

Talk is cheap, here is the demo code for Linux (defensive code and error handlers are omitted for simplicity). 谈话很便宜,这是Linux的演示代码(为简单起见,省略了防御性代码和错误处理程序)。 Just copy & paste to pipe_ipc_demo.cc then compile & run the test. 只需复制并粘贴到pipe_ipc_demo.cc然后编译并运行测试。

#include <unistd.h>
#include <string.h>
#include <pthread.h>
#include <string>
#include <list>

template<class T> class MPSCQ { // pipe based Multi Producer Single Consumer Queue
public:
        MPSCQ();
        ~MPSCQ();
        int producerPush(const T* t); 
        T* consumerPoll(double timeout = 1.0);
private:
        void _consumeFd();
        int _selectFdConsumer(double timeout);
        T* _popFront();
private:
        int _fdProducer;
        int _fdConsumer;
        char* _consumerBuf;
        std::string* _partial;
        std::list<T*>* _list;
        static const int _PTR_SIZE;
        static const int _CONSUMER_BUF_SIZE;
};

template<class T> const int MPSCQ<T>::_PTR_SIZE = sizeof(void*);
template<class T> const int MPSCQ<T>::_CONSUMER_BUF_SIZE = 1024;

template<class T> MPSCQ<T>::MPSCQ() :
        _fdProducer(-1),
        _fdConsumer(-1) {
        _consumerBuf = new char[_CONSUMER_BUF_SIZE];
        _partial = new std::string;     // for holding partial pointer address
        _list = new std::list<T*>;      // unconsumed T* cache
        int fd_[2];
        int r = pipe(fd_);
        _fdConsumer = fd_[0];
        _fdProducer = fd_[1];
}


template<class T> MPSCQ<T>::~MPSCQ() { /* omitted */ }

template<class T> int MPSCQ<T>::producerPush(const T* t) {
        return t == NULL ? 0 : write(_fdProducer, &t, _PTR_SIZE);
}

template<class T> T* MPSCQ<T>::consumerPoll(double timeout) {
        T* t = _popFront();
        if (t != NULL) {
                return t;
        }
        if (_selectFdConsumer(timeout) <= 0) {  // timeout or error
                return NULL;
        }
        _consumeFd();
        return _popFront();
}

template<class T> void MPSCQ<T>::_consumeFd() {
        memcpy(_consumerBuf, _partial->data(), _partial->length());
        ssize_t r = read(_fdConsumer, _consumerBuf, _CONSUMER_BUF_SIZE - _partial->length());
        if (r <= 0) {   // EOF or error, error handler omitted
                return;
        }
        const char* p = _consumerBuf;
        int remaining_len_ = _partial->length() + r;
        T* t;
        while (remaining_len_ >= _PTR_SIZE) {
                memcpy(&t, p, _PTR_SIZE);
                _list->push_back(t);
                remaining_len_ -= _PTR_SIZE;
                p += _PTR_SIZE;
        }
        *_partial = std::string(p, remaining_len_);
}

template<class T> int MPSCQ<T>::_selectFdConsumer(double timeout) {
        int r;
        int nfds_ = _fdConsumer + 1;
        fd_set readfds_;
        struct timeval timeout_;
        int64_t usec_ = timeout * 1000000.0;
        while (true) {
                timeout_.tv_sec = usec_ / 1000000;
                timeout_.tv_usec = usec_ % 1000000;
                FD_ZERO(&readfds_);
                FD_SET(_fdConsumer, &readfds_);
                r = select(nfds_, &readfds_, NULL, NULL, &timeout_);
                if (r < 0 && errno == EINTR) {
                        continue;
                }
                return r;
        }
}

template<class T> T* MPSCQ<T>::_popFront() {
        if (!_list->empty()) {
                T* t = _list->front();
                _list->pop_front();
                return t;
        } else {
                return NULL;
        }
}

// = = = = = test code below = = = = =

#define _LOOP_CNT    5000000
#define _ONE_MILLION 1000000
#define _PRODUCER_THREAD_NUM 2

struct TestMsg {        // all public
        int _threadId;
        int _msgId;
        int64_t _val;
        TestMsg(int thread_id, int msg_id, int64_t val) :
                _threadId(thread_id),
                _msgId(msg_id),
                _val(val) { };
};

static MPSCQ<TestMsg> _QUEUE;
static int64_t _SUM = 0;

void* functor_producer(void* arg) {
        int my_thr_id_ = pthread_self();
        TestMsg* msg_;
        for (int i = 0; i <= _LOOP_CNT; ++ i) {
                if (i == _LOOP_CNT) {
                        msg_ = new TestMsg(my_thr_id_, i, -1);
                } else {
                        msg_ = new TestMsg(my_thr_id_, i, i + 1);
                }
                _QUEUE.producerPush(msg_);
        }
        return NULL;
}


void* functor_consumer(void* arg) {
        int msg_cnt_ = 0;
        int stop_cnt_ = 0;
        TestMsg* msg_;
        while (true) {
                if ((msg_ = _QUEUE.consumerPoll()) == NULL) {
                        continue;
                }
                int64_t val_ = msg_->_val;
                delete msg_;
                if (val_ <= 0) {
                        if ((++ stop_cnt_) >= _PRODUCER_THREAD_NUM) {
                                printf("All done, _SUM=%ld\n", _SUM);
                                break;
                        }
                } else {
                        _SUM += val_;
                        if ((++ msg_cnt_) % _ONE_MILLION == 0) {
                                printf("msg_cnt_=%d, _SUM=%ld\n", msg_cnt_, _SUM);
                        }
                }
        }
        return NULL;
}

int main(int argc, char* const* argv) {
        pthread_t consumer_;
        pthread_create(&consumer_, NULL, functor_consumer, NULL);
        pthread_t producers_[_PRODUCER_THREAD_NUM];
        for (int i = 0; i < _PRODUCER_THREAD_NUM; ++ i) {
                pthread_create(&producers_[i], NULL, functor_producer, NULL);
        }
        for (int i = 0; i < _PRODUCER_THREAD_NUM; ++ i) {
                pthread_join(producers_[i], NULL);
        }
        pthread_join(consumer_, NULL);
        return 0;
}

And here is test result ( 2 * sum(1..5000000) == (1 + 5000000) * 5000000 == 25000005000000 ): 这是测试结果( 2 * sum(1..5000000) == (1 + 5000000) * 5000000 == 25000005000000 ):

$ g++ -o pipe_ipc_demo pipe_ipc_demo.cc -lpthread
$ ./pipe_ipc_demo    ## output may vary except for the final _SUM
msg_cnt_=1000000, _SUM=251244261289
msg_cnt_=2000000, _SUM=1000708879236
msg_cnt_=3000000, _SUM=2250159002500
msg_cnt_=4000000, _SUM=4000785160225
msg_cnt_=5000000, _SUM=6251640644676
msg_cnt_=6000000, _SUM=9003167062500
msg_cnt_=7000000, _SUM=12252615629881
msg_cnt_=8000000, _SUM=16002380952516
msg_cnt_=9000000, _SUM=20252025092401
msg_cnt_=10000000, _SUM=25000005000000
All done, _SUM=25000005000000

The technique showed here is used in our production applications. 这里显示的技术用于我们的生产应用程序。 One typical usage is the consumer thread acts as a log writer, and worker threads can write log messages almost asynchronously. 一种典型用法是使用者线程充当日志编写器,而工作线程几乎可以异步写入日志消息。 Yes, almost means sometimes writer threads may be blocked in write() when pipe is full, and this is a reliable congestion control feature provided by OS. 是的, 几乎意味着有时候,当管道已满时,有时写程序线程可能在write()被阻塞,这是OS提供的可靠的拥塞控制功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM