并行执行比串行执行花费更多时间

Question

我基本上是在编写代码来计算一对总和是否为偶数（在从 1 到 100000 的所有对中）。 我使用 pthreads 而没有 pthreads 编写了一个代码。 但是带有 pthreads 的代码比串行代码花费的时间更多。 这是我的序列号

#include<bits/stdc++.h>
using namespace std;

int main()
{
  long long sum = 0, count = 0, n = 100000;
  auto start = chrono::high_resolution_clock::now();
  for(int i = 1; i <= n; i++)
    for(int j = i-1; j >= 0; j--)
    {
        sum = i + j;
        if(sum%2 == 0)
            count++;
    }
  cout<<"count is "<<count<<endl;

  auto end = chrono::high_resolution_clock::now();
  double time_taken = chrono::duration_cast<chrono::nanoseconds>(end - start).count();
  time_taken *= 1e-9;
  cout << "Time taken by program is : " << fixed << time_taken << setprecision(9)<<" secs"<<endl;
  return 0;
}

这是我的并行代码

#include<bits/stdc++.h>
using namespace std;
#define MAX_THREAD 3

long long cnt[5] = {0};
long long n = 100000;
int work_per_thread;
int start[] = {1, 60001, 83001, 100001};
void *count_array(void* arg)
{
   int t = *((int*)arg);
   long long sum = 0;
   for(int i = start[t]; i < start[t+1]; i++)
     for(int j = i-1; j >=0; j--)
     {
        sum = i + j;
            if(sum%2 == 0)
                cnt[t]++;
     }
   cout<<"thread"<<t<<" finished work "<<cnt[t]<<endl;
   return NULL;
}


int main()
{
    pthread_t threads[MAX_THREAD];
    int arr[] = {0,1,2};

    long long total_count = 0;
    work_per_thread = n/MAX_THREAD;

   auto start = chrono::high_resolution_clock::now();
   for(int i = 0; i < MAX_THREAD; i++)
       pthread_create(&threads[i], NULL, count_array, &arr[i]);

   for(int i = 0; i < MAX_THREAD; i++)
       pthread_join(threads[i], NULL);

   for(int i = 0; i < MAX_THREAD; i++)
       total_count += cnt[i];

   cout << "count is " << total_count << endl;

   auto end = chrono::high_resolution_clock::now();
   double time_taken = chrono::duration_cast<chrono::nanoseconds>(end - start).count();
   time_taken *= 1e-9;
   cout << "Time taken by program is : " << fixed << time_taken << setprecision(9)<<" secs"<<endl;
   return 0;
}

在并行代码中，我创建了三个线程，第一个线程将从 1 到 60000 进行计算，第二个线程从 60001 到 83000 等等。 我选择了这些数字，以便每个线程进行大致相似数量的计算。 并行执行需要 10.3 秒，而串行执行需要 7.7 秒。 我有 6 个内核，每个内核有 2 个线程。 我还使用 htop 命令检查所需数量的线程是否正在运行，它似乎工作正常。 我不明白问题出在哪里。

Answer 1

线程版本中的所有内核都在竞争 cnt[]。

在循环内使用本地计数器，并在循环准备好后将结果复制到 cnt[t] 中。

并行执行比串行执行花费更多时间

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-02 06:41:16

并行执行比串行执行花费更多时间

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-02 06:41:16

解决方案1
1 已采纳 2022-08-02 06:41:16