C++, How to divide cpu work through threads exactly in non-parallel task

Question

I'm trying to make a calculator for prime numbers.

The program divides n for every number inferior to n. If the remainder of the division was 0 only 1 time (excluding the division for 1), the number is prime. At the start of the program the user is asked to type a number, then calculations are made for every number until the one typed by the user.

This is a non-parallel task, but I am trying to make it parallel by dividing the numbers between cores.

This is the piece of code that divides the task between the threads.

void division(int number)
{
    int ithread[8]{};
    int sum = 0;
    cout << "Preparation..";
    /* Calculating how many numbers the checker will check. */
    ithread[0] = (int)number*0.125;
    ithread[1] = (int)number*0.125;
    ithread[2] = (int)number*0.125;
    ithread[3] = (int)number*0.125;
    ithread[4] = (int)number*0.125;
    ithread[5] = (int)number*0.125;
    ithread[6] = (int)number*0.125;
    ithread[7] = (int)number*0.125;

    /* Calculating from what number the checkers will begin.
    the first thread will begin from 0. the second checker will begin from the last number the first                            
    did. The third will begin from the sum of numbers checked by first and second and so on. */

    thread thread0(noprint, ithread[0], 0);
    sum += ithread[0];
    thread thread1(noprint, ithread[1], sum);
    sum += ithread[1];
    thread thread2(noprint, ithread[2], sum);
    sum += ithread[2];
    thread thread3(noprint, ithread[3], sum);
    sum += ithread[3];
    thread thread4(noprint, ithread[4], sum);
    sum += ithread[4];
    thread thread5(noprint, ithread[5], sum);
    sum += ithread[5];
    thread thread6(noprint, ithread[6], sum);
    sum += ithread[6];
    thread thread7(noprint, ithread[7], sum);
    thread0.join();
    cout << "thread1";
    thread1.join();
    cout << "thread2";
    thread2.join();
    cout << "thread3";
    thread3.join();
    cout << "thread4";
    thread4.join();
    cout << "thread5";
    thread5.join();
    cout << "thread6";
    thread6.join();
    cout << "thread7";
    thread7.join();
    cout << "thread8";

}

The problem is, some threads end before others, and this can be a big problem with big numbers. For example, 4 takes exactly twice as long as 2 to be checked, and 8 takes twice as long as 4. So, if I ask the program to check all the numbers until 1 million, the first thread will check from 0 to 125000, a pretty easy task for nowadays CPUs. The second is going to check from 125000 to 250000, thus being twice as difficult, and so on.

Now I'm looking for two answers: 1. If you know, please tell me how to divide the load equally between threads. 2. Please explain how to make it so the user is able to choose the number of threads. I already imagined how to make thread selection possible up to 64 threads (well, actually it could be made even for 1 trilion threads, it would just require a lot of IFs and an 1 trillion digit array) the problem is not in the code, it's in the math itself. I don't know how to divide the work equally for 8 cores, let alone for a variable amount of cores.

Answer 1

Don't try to divide the work up all in one go at the beginning - threads are not predictable like that. You have no control over what other loads the OS will place on each core, and what may appear to you to be "equal" workloads could actually be very different depending on the code. Instead, divide the work load up into a large quantity of much smaller units, and have each thread start on the next one when it finishes the previous, until all are done.

As for having the user specify the number of threads, what exactly are you stuck on? It seems a simple matter to ask the user for a number, and then spawn that many threads. However, most multi-threaded programs do not do this. It is better to query the system for how many threads it can run (eg std::thread::hardware_concurrency ), and use that.

Also, on another note, your algorithm for checking primes is extremely inefficient - presumably this is just a learning exercise and not serious code? If not, you may want to look at other algorithms - checking for primes is a well studied problem.

But JBentley, if I do like you say the operations will not be simultanious. Yes, the application would use different threads, but in an alternate way, what's the point in that? Wouldn't it be the same to use just one thread? I'm pretty new so sorry if i'm wrong. – Alex

No, it would still be parallel. You have a shared data structure which tracks the last allocated chunk of work. This could be as simple as an int which contains the last number that was checked. When a thread runs out of work, it starts working on the next N numbers, and increments the int by the appropriate amount. When doing this you need to be careful that multiple threads can't use the shared variable at the same time - there are various mechanisms available in C++ or third party libraries to manage this.

Pseudocode:

lastChecked = 1
thread 1: lock lastChecked
thread 1: lastChecked = 10
thread 1: unlock lastChecked
thread 1: start working on numbers 1 to 10
thread 2: lock lastChecked
thread 2: lastChecked = 20
thread 2: unlock lastChecked
thread 2: start working on numbers 11 to 20
thread 1: complete work
thread 1: lock lastChecked
thread 1 lastChecked = 30
thread 1: unlock lastChecked
thread 1: start working on numbers 21 to 30
// etc.

Note: you should choose the size of each work unit carefully. Make it too large, and you start going back towards your original problem where some threads might finish a lot later than others. Make it too small, and you increase the risk of threads waiting around too much to access the shared state while other threads are using it, and you spend too much time on the overheads of allocating each workload.

Answer 2

Check on this website pthreads library

https://computing.llnl.gov/tutorials/pthreads/

Answer 3

Check out the section on the boss-worker model.

http://www-01.ibm.com/software/network/dce/library/publications/appdev/html/APPDEV14.HTM#HDRWQ168

This thread model might apply very well to your problem.

Answer 4

One way to check for prime numbers uses three phases. The initial phase generates an array of prime numbers, up to some value p, using just one thread. The next phase uses the array to check for prime numbers up to p^2 (p squared), dividing the range from p to p^2 evenly between the threads. Each thread creates it's own array of newly found prime numbers. After the threads complete, the arrays are concatenated to the original array, and and p is set to the highest prime number found. Then the cycle is repeated until p^2 >= n. Then the final phase uses the array to check if n is prime.

C++, How to divide cpu work through threads exactly in non-parallel task

Question

4 answers

solution1
2 ACCPTED 2014-07-10 23:56:40

solution2
0 2014-07-10 23:07:10

solution3
0 2014-07-10 23:27:00

solution4
0 2014-07-11 02:33:23

C++, How to divide cpu work through threads exactly in non-parallel task

Question

4 answers

solution1 2 ACCPTED 2014-07-10 23:56:40

solution2 0 2014-07-10 23:07:10

solution3 0 2014-07-10 23:27:00

solution4 0 2014-07-11 02:33:23

solution1
2 ACCPTED 2014-07-10 23:56:40

solution2
0 2014-07-10 23:07:10

solution3
0 2014-07-10 23:27:00

solution4
0 2014-07-11 02:33:23