C++ - Questions about multithreading

Question

I am having trouble understanding some concepts of multithreading. I know the basic principles but am having trouble with the understanding of when individual threads are sent and used by cores.

I know that having multiple threads allow code to run in parallel. I think this would be a good addition to my archive extraction program which could decompress blocks using multiple cores. It decompresses all of the files in a for loop and I am hoping that each available core will work on a file.

Here are my questions:

Do I need to query or even consider the number of cores on a machine or when the threads are running, they are automatically sent to free cores?
Can anyone show me an example of a for loop using threads. Say in each loop iteration it would call a function using a different thread. I read that the ideal number of threads to have active are the number of cores. How do I know when a core is free or should I check to see if it has joined main thread, and create a new thread when it has to keep a certain number of threads running.

Am I overcomplicating things or are my questions indicative that I am not grasping the concepts?

Answer 1

If you're decompressing files then you'll probably want a bounded number of thread rather than one thread per file. Otherwise, if you're processing 1000 files you're going to create 1000 thread, which won't make efficient use of the cpu.

As you've mentioned, one approach is to create as many threads as there are cores, and this is a reasonable approach in your case as decompression is reasonably cpu bound, and therefore any threads you create will be active for most of their time slice. If your problem with IO bound then your threads would be spending a lot of time waiting for IO to complete, and therefore you could have spin of more threads than you've got cores, within bounds.

For your application I'd probably look at spinning up one thread per core, and have each thread process one file at a time. This will help keep your algorithm simple. If you had multiple threads working on one file then you'd have to synchronize between them in order to ensure that the blocks they processed were written out to the correct location in the uncompressed file, which will cause needless headaches.

C++11 includes a thread library which you can use simplify working with threads.

Answer 2

No, you can use an API that keeps that transparent, for example POSIX threads on Linux ( pthread library).

This answer probably depends on what API you use, though many APIs share threading basics like mutexes. Here, however, is a pthreads example (since that's the only C/C++ threading API I know).

 #include <stdio.h> #include <stdlib.h> #include <pthread.h> // Whatever other headers you need for your code. #define MAX_NUM_THREADS 12 // Each thread will run this function. void *worker( void *arg ) { // Do stuff here and it will be 'in parallel'. // Note: Threads can read from the same location concurrently // without issue, but writing to any shared resource that has not been // locked with, for example, a mutex, can cause pernicious bugs. // Call this when you're done. pthread_exit( NULL ); } int main() { // Each is a handle for one thread, with 12 in total. pthread_t myThreads[MAX_NUM_THREADS]; // Create the worker threads. for(unsigned long i = 0; i < numThreads; i++) { // NULL thread attributes struct. // This initializes the threads with the default PTHREAD_CREATE_JOINABLE // attribute; we know a thread is finished when it joins, see below. pthread_create(&myThreads[i], NULL, worker, (void *)i); } void *status; // Wait for the threads to finish. for(unsigned int i = 0; i < numThreads; i++) { pthread_join(myThreads[i], &status); } // That's all, folks. pthread_exit(NULL); }

Without too much detail, that's a pretty basic skeleton for a simple threaded application using pthreads.

Regarding your questions on the best way to go about applying this to your program:

I suggest one thread per file, using a Threadpool Pattern , and here's why:

Single thread per file is much simpler because there's no sharing, hence no synchronization. You can change the worker function to a decompressFile function, passing a filename each time you call pthread_create . That's basically it. Your threadpool pattern sort of falls into place here.

Multiple threads per file means synchronization, which means complexity because you have to manage access to shared resources. In order to speed up your algorithm, you'd have to isolate portions of it that can run in parallel. However, I would actually expect this method to run slower :

Imagine Thread A has File A open, and Thread B has File B open, but File A and File B are in completely different sectors of your disk. As your OS's scheduling algorithm switches between Thread A and Thread B, your hard drive has to spin like mad to keep up, making the CPU (hence your program) wait.

Answer 3

Since you are seemingly new to threading/parallelism, and you just want to get more performance out of multiple processors/cores, I suggest you look for libraries that deal with threading and allow you to enable parallelism without getting into thread management, work distribution etc.

It sounds all you need now is a parallel loop execution. Nowadays there is a plenty of C++ libraries that can ease this task for you, eg Intel's TBB, Microsoft's PPL, AMD's Bolt, Quallcomm's MARE to name a few. You may compare licensing terms, supported platforms, functionality and make a choice that best fits your needs.

To be more specific and answer your questions:

1) Generally, you should have no need to know/consider the number of processors or cores. Choose a library that abstracts this detail away from you and your program. On the other hand, if you see that with default settings CPU is not fully utilized (eg due to a significant number of I/O operations), you may find it useful to ask for more threads, eg by multiplying the default by a certain factor.

2) A sketch of a for loop made parallel with tbb::parallel_for and C++11 lambda functions:

#include <tbb/tbb.h>
void ParallelFoo( std::vector<MyDataType>& v ) {
    tbb::parallel_for( size_t(0), v.size(), [&](int i){
        Foo( v[i] );
    } );
}

Note that it is not guaranteed that each iteration is executed by a separate thread; but you should not actually worry about such details; all you need is available cores being busy with useful work.

Disclaimer: I'm a developer of Intel's TBB library.

Answer 4

If you're on Windows, you could take a look at Thread Pools, a good description can be found here: http://msdn.microsoft.com/en-us/magazine/cc163327.aspx . An interesting feature of this facility is that it promises to manage the threads for you. It also selects the optimal number of threads depending on demand as well as on the available cores.

C++ - Questions about multithreading

Question

4 answers

solution1
3 2014-01-10 08:32:25

solution2
2 2014-01-10 08:29:50

solution3
1 2014-01-10 14:51:26

solution4
0 2014-01-10 08:50:24

C++ - Questions about multithreading

Question

4 answers

solution1 3 2014-01-10 08:32:25

solution2 2 2014-01-10 08:29:50

solution3 1 2014-01-10 14:51:26

solution4 0 2014-01-10 08:50:24

solution1
3 2014-01-10 08:32:25

solution2
2 2014-01-10 08:29:50

solution3
1 2014-01-10 14:51:26

solution4
0 2014-01-10 08:50:24