简体   繁体   English

Pthread 比串行版本慢

[英]Pthread is slower than serial version

I have a function that does the rearrangement of some data in a table of N particles and I am trying to convert it to parallel.我有一个 function 对 N 粒子表中的某些数据进行重新排列,我正在尝试将其转换为并行。 The serial version of it is the following:它的串行版本如下:

#include "stdio.h"
#include "stdlib.h"
#include "string.h"


#define DIM 3


void data_rearrangement(float *Y, float *X, unsigned int *permutation_vector, int N){

      for(int i=0; i<N; i++){
          memcpy(&Y[i*DIM], &X[permutation_vector[i]*DIM], DIM*sizeof(float));
      }
}

The parallel version that I have made is the following:我制作的并行版本如下:

#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include "utils.h"
#include <pthread.h>

#define DIM 3

struct data{
    float *Y;
    float *X;
    unsigned int *permutation_vector;
    int N;
};

void *threaded_data_rearrangement(void *args){

  struct data *new_data;
  new_data = (struct data *) args;

  for(int i=0; i<new_data->N/NUM_THREADS; i++){
    memcpy(&new_data->Y[i*DIM], &new_data->X[new_data->permutation_vector[i]*DIM], DIM*sizeof(float));
  }

}

void data_rearrangement(float *Y, float *X, unsigned int *permutation_vector, int N){

  void *status;
  int rc;
  struct data data_array[NUM_THREADS];
  pthread_t threads[NUM_THREADS];
  int k;

  for (k=0; k<NUM_THREADS; k++){
    data_array[k].Y = Y;
    data_array[k].X = X;
    data_array[k].permutation_vector = permutation_vector;
    data_array[k].N = N;

    pthread_create(&threads[k], NULL, threaded_data_rearrangement, (void *)&data_array[k]);
  }

  for(k=0; k<NUM_THREADS; k++){
    rc = pthread_join(threads[k], &status);
    if (rc) {
      printf("ERROR; return code from pthread_join() is %d\n",rc);
      exit(-1);
    }
  }

}

Can someone explain me, why is the parallel version slower?有人可以解释一下,为什么并行版本更慢? *note that NUM_THREADS has been declared as a global variable in my whole project. *请注意,在我的整个项目中,NUM_THREADS 已被声明为全局变量。 I can't think of any other ways to change this function to parallel.我想不出任何其他方法来将这个 function 更改为并行。 Thank you in advance:)先感谢您:)

I am not sure what you are trying to accomplish (in terms of time) since you did not include a main function. Here I present you a working main function:我不确定你想要完成什么(就时间而言),因为你没有包括一个主要的 function。在这里我向你展示一个工作的主要 function:

#define NUM_THREADS 8

int main() {
   
   // arrays size number (change accordingly)
   int sz = 1e7;
   int *pv;
   float *X,*Y;
   float a;
   int i;
   
   // create pointer to the arrays
   X = (float*)malloc(sz * sizeof(float));
   Y = (float*)malloc(sz * sizeof(float));
   pv = (int*)malloc(sz * sizeof(int));
   
   // fill with random numbers
   for(i=0;i<sz;i++){
     a = rand()%100;
     X[i] = (float)rand()/(float)(RAND_MAX/a);;
     Y[i] = (float)rand()/(float)(RAND_MAX/a);;
     pv[i] = rand()%100;
   } 
   
   // measure serial version execution time
   clock_t t;
   t = clock();
   printf("Timer starts \n");
   data_rearrangement(X,Y,pv,sz);
   t = clock() - t;
   double time_taken = ((double)t)/CLOCKS_PER_SEC; // calculate the elapsed time
   printf("Parallel took %f seconds to execute \n", time_taken);
   
   // measure parallel version execution time
   clock_t t2;
   t2 = clock();
   printf("Timer starts \n");
   serial_data_rearrangement(X,Y,pv,sz);
   t2 = clock() - t2;
   time_taken = ((double)t2)/CLOCKS_PER_SEC; // calculate the elapsed time
   printf("Serial took %f seconds to execute \n", time_taken);
   
   return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM