简体   繁体   English

C OpenMP - 降低可扩展性

[英]C OpenMP - Reduction scalability

I'm testing the performance speedup of some algorithms when using OpenMP and one of then is not scaling. 我正在测试使用OpenMP时某些算法的性能加速,其中一个算法没有缩放。 Am I doing something wrong? 难道我做错了什么?

PC Details: 电脑详情:

  • Memory: 7,7 GiB 记忆: 7,7 GiB
  • Processor: Intel® Core™ i7-4770 CPU @ 3.40GHz × 8 处理器:英特尔®酷睿™i7-4770 CPU @ 3.40GHz×8
  • OS: Ubuntu 15.04 64-bit 操作系统: Ubuntu 15.04 64位
  • gcc: gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2 gcc: gcc(Ubuntu 4.8.2-19ubuntu1)4.8.2

Code: 码:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>

int main(int argc, char **argv) {
  int test_size, i;
  double *vector, mean, stddeviation, start_time, duration;

  if (argc != 2) {
    printf("Usage: %s <test_size>\n", argv[0]);
    return 1;
  }

  srand((int) omp_get_wtime());

  test_size = atoi(argv[1]);
  printf("Test Size: %d\n", test_size);

  vector = (double *) malloc(test_size * sizeof(double));
  for (i = 0; i < test_size; i++) {
    vector[i] = rand();
  }

  start_time = omp_get_wtime();
  mean = 0;
  stddeviation = 0;
#pragma omp parallel default(shared) private(i)
  {
#pragma omp for reduction(+:mean)
    for (i = 0; i < test_size; i++) {
      mean += vector[i];
    }
#pragma omp single
    mean /= test_size;

#pragma omp for reduction(+:stddeviation)
    for (i = 0; i < test_size; i++) {
      stddeviation += (vector[i] - mean)*(vector[i] - mean);
    }
  }
  stddeviation = sqrt(stddeviation / test_size);
  duration = omp_get_wtime() - start_time;

  printf("Std. Deviation = %lf\n", stddeviation);
  printf("Duration: %fms\n", duration*1000);

  return 0;
}

Compilation line 编译行

gcc -c -o main.o main.c -fopenmp -lm -O3
gcc -o dp main.o -fopenmp -lm -O3

Results 结果

$ OMP_NUM_THREADS=1 ./dp 100000000
166.224199ms

$ OMP_NUM_THREADS=2 ./dp 100000000
157.924034ms

$ OMP_NUM_THREADS=4 ./dp 100000000
159.056189ms

I am not reproducing your results with Ubuntu 14.04.2 LTS, gcc 4.8, and a 2.3 GHz Intel Core i7. 我没有使用Ubuntu 14.04.2 LTS,gcc 4.8和2.3 GHz Intel Core i7再现您的结果。 Here are the results that I get: 以下是我得到的结果:

$ OMP_NUM_THREADS=1 ./so30627170 100000000
Test Size: 100000000
Std. Deviation = 619920018.463329
Duration: 206.301721ms
$ OMP_NUM_THREADS=2 ./so30627170 100000000
Test Size: 100000000
Std. Deviation = 619901821.463117
Duration: 110.381279ms
$ OMP_NUM_THREADS=4 ./so30627170 100000000
Test Size: 100000000
Std. Deviation = 619883614.594906
Duration: 78.241708ms

Because the output listed in the "Results" section of your question could not match the output from the code as listed, you may be running an old version of your code. 由于问题的“结果”部分中列出的输出与列出的代码的输出不匹配,因此您可能正在运行旧版本的代码。

I thought about possibly using X86 intrinsics within the parallel for loops, but examining the assembly output, gcc already uses SIMD instructions in this case. 我想可能在并行for循环中使用X86内在函数,但是检查汇编输出,gcc在这种情况下已经使用了SIMD指令。 Without march options, I was seeing gcc use SSE2 instructions. 没有行军选项,我看到gcc使用SSE2指令。 Compiling with -march=native or -mavx , gcc would use AVX instructions. 使用-march=native-mavx编译时,gcc将使用AVX指令。

EDIT: Running the Go version of your program, I get: 编辑:运行你的程序的Go版本,我得到:

$ ./tcc-go-desvio-padrao -w 1 -n 15 -t 100000000
2015/06/07 08:26:43 Workers: 1
2015/06/07 08:26:43 Tests: [100000000]
2015/06/07 08:26:43 # of executions of each test: 15
2015/06/07 08:26:43 Time to allocate memory: 584.477µs
2015/06/07 08:26:43 ===========================================
2015/06/07 08:26:43 Current test size: 100000000
2015/06/07 08:27:05 Time to fill the array: 1.322556083s
2015/06/07 08:27:05 Time to calculate: 194.10728ms
$ ./tcc-go-desvio-padrao -w 2 -n 15 -t 100000000
2015/06/07 08:27:10 Workers: 2
2015/06/07 08:27:10 Tests: [100000000]
2015/06/07 08:27:10 # of executions of each test: 15
2015/06/07 08:27:10 Time to allocate memory: 565.273µs
2015/06/07 08:27:10 ===========================================
2015/06/07 08:27:10 Current test size: 100000000
2015/06/07 08:27:22 Time to fill the array: 677.755324ms
2015/06/07 08:27:22 Time to calculate: 113.095753ms
$ ./tcc-go-desvio-padrao -w 4 -n 15 -t 100000000
2015/06/07 08:27:28 Workers: 4
2015/06/07 08:27:28 Tests: [100000000]
2015/06/07 08:27:28 # of executions of each test: 15
2015/06/07 08:27:28 Time to allocate memory: 576.568µs
2015/06/07 08:27:28 ===========================================
2015/06/07 08:27:28 Current test size: 100000000
2015/06/07 08:27:34 Time to fill the array: 353.646193ms
2015/06/07 08:27:34 Time to calculate: 79.86221ms

The timings appear about the same as the OpenMP version. 时间与OpenMP版本大致相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM