简体   繁体   中英

How to improve the time performance of a C++ pthread code that uses Barriers

I wrote a code for the simulation of a communication system. Within this communication system, there is a part that I am running in parallel using pthreads. It basically corrects errors that are caused by the channel.

When I receive a frame of bits, 3 components of the algorithm run over the bits. Usually, they are run one after the other, which results in optimal performance, but a huge delay.

The idea is to make them run in parallel. But to get optimal performance, the 3 components much process each bit at the same time.

If I just run them in parallel I obtain bad results, but quite a fast performance. So I used barriers to synchronize the process, where each bit is processed by the three components before allowing them to jump to the following bit.

The performance of this method is optimal. But the code is running really slow, I am taking even slower than the serial implementation.

The code runs on Ubuntu with GCC compiler.

EDIT: One more question, do threads go to sleep while they are waiting for the barrier to open? and if so, how do I prevent them from doing so?

If you literally have to synchronise after every bit , then quite simply threading is not going to be an appropriate approach. The synchronisation overhead is going to far exceed the cost of computation, so you will be better off doing it in a single thread.

Can you split the work up at a higher level? For example, have an entire frame processed by a single thread, but have multiple frames processed in parallel?

Here is a possible solution, USING NO MUTEX.

Lets say you have 4 threads: main thread reading some input, the other 3 threads processing its chunk by chunk. a thread can process a chunk just after the previous one done processing it.

so you have a data type for a chunk:

class chunk{
 byte buffer[CHUNK_SIZE];
 char status; // use char for atomic input, c++11 can use std::atomic_int.
public: 
 chunk():status(0); 
};

and you have a list of chunks:

std::list<chunk> chunks;

all threads running on chunks, till they reach end of list, but wait till status reach a condition, main thread set status to 1 when input chunk is done. 1st thread wait till status is 1, means input was done and set status to 2 when done, thread 2 wait till status is 2 means thread 1 was done, and when done processing this chunk, set status to 3, and so on. finally, main thread wait till status is 4 to get results

Important when setting the status, to use a = not ++ to make it as atomic as possible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM