简体   繁体   中英

Threads failing to affect performance

Below is a small program meant to parallelize the approximation of the 1/(n^2) series. Note the global parameter NUM_THREADS .

My issue is that increasing the number of threads from 1 to 4 (the number of processors my computer has is 4) does not significantly affect the outcomes of timing experiments. Do you see a logical flaw in the ThreadFunction ? Is there false sharing or misplaced blocking that ends up serializing the execution?

#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <string>
#include <future>
#include <chrono>

std::mutex sum_mutex;           // This mutex is for the sum vector
std::vector<double> sum_vec;    // This is the sum vector
int NUM_THREADS = 1;
int UPPER_BD = 1000000;

/* Thread function */
void ThreadFunction(std::vector<double> &l, int beg, int end, int thread_num)
{
    double sum = 0;
    for(int i = beg; i < end; i++) sum += (1 / ( l[i] * l[i]) );
    std::unique_lock<std::mutex> lock1 (sum_mutex, std::defer_lock);
    lock1.lock();
    sum_vec.push_back(sum);
    lock1.unlock();
}

void ListFill(std::vector<double> &l, int z)
{
    for(int i = 0; i < z; ++i) l.push_back(i);
}

int main() 
{
    std::vector<double> l;
    std::vector<std::thread> thread_vec;

    ListFill(l, UPPER_BD);
    int len = l.size();

    int lower_bd = 1;
    int increment = (UPPER_BD - lower_bd) / NUM_THREADS;
    for (int j = 0; j < NUM_THREADS; ++j)
    {
        thread_vec.push_back(std::thread(ThreadFunction, std::ref(l), lower_bd, lower_bd + increment, j));
        lower_bd += increment;
    }

    for (auto &t : thread_vec) t.join();
    double big_sum;
    for (double z : sum_vec) big_sum += z;

    std::cout << big_sum << std::endl;

    return 0;
}

From looking at your code, I suspect that ListFill is taking longer than ThreadFunction. Why pass a list of values to the thread instead of the bounds each thread should loop over? Something like:

void ThreadFunction( int beg, int end ) {
    double sum = 0.0;
    for(double i = beg; i < end; i++) 
         sum += (1.0 / ( i * i) );
    std::unique_lock<std::mutex> lock1 (sum_mutex);
    sum_vec.push_back(sum);
}

To maximize parallelism, you need to push as much work as possible onto the threads. See Amdahl's Law

In addition to dohashi's nice improvement, you can remove the need for the mutex by populating the sum_vec in advance in the main thread:

sum_vec.resize(4);

then writing directly to it in ThreadFunction :

sum_vec[thread_num] = sum;

since each thread writes to a distinct element and doesn't modify the vector itself there is no need to lock anything.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM