[英]How to safely parallel the for-loop with memcpy inside
我正在开发KSVD软件包中的原始串行代码以支持OpenMP。 原始代码如下所示:原始代码在MATLAB中的作用类似于im2col,并从图像中提取补丁。
/* n stands for the size of an image, sz stands for the patch size to extract */
int blocknum = 0;
for (k=0; k<=n[2]-sz[2]; k+=1) {
for (j=0; j<=n[1]-sz[1]; j+=1) {
for (i=0; i<=n[0]-sz[0]; i+=1) {
/* copy single block */
for (m=0; m<sz[2]; m++) {
for (l=0; l<sz[1]; l++) {
memcpy(b + blocknum*sz[0]*sz[1]*sz[2] + m*sz[0]*sz[1] + l*sz[0], x+(k+m)*n[0]*n[1]+(j+l)*n[0]+i, sz[0]*sizeof(double));
}
}
blocknum ++;
}
}
}
同时,我想通过用索引变量blockid替换增量blocknum使其平行。
/* n stands for the size of an image, sz stands for the patch size to extract */
int blockid3, blockid2, blockid;
for (k=0; k<=n[2]-sz[2]; k+=1) {
blockid3 = k * (n[1]-sz[1]+1) * (n[0]-sz[0]+1);
#pragma omp parallel for
for (j=0; j<=n[1]-sz[1]; j+=1) {
blockid2 = j * (n[0]-sz[0]+1);
for (i=0; i<=n[0]-sz[0]; i+=1) {
blockid = i + blockid2 + blockid3;
/* copy single block */
for (m=0; m<sz[2]; m++) {
for (l=0; l<sz[1]; l++) {
memcpy(b + blockid*sz[0]*sz[1]*sz[2] + m*sz[0]*sz[1] + l*sz[0], x+(k+m)*n[0]*n[1]+(j+l)*n[0]+i, sz[0]*sizeof(double));
}
}
}
}
}
然后跑步会导致致命的细分错误。 我不知道为什么(根据堆栈跟踪,它似乎与安全线程相关)。 因为我认为并行线程不应一次访问相同的地址。 我是否应该设置变量的某些属性,即静态或共享或私有? 这是堆栈跟踪:
Stack Trace (from fault):
[ 0] 0x00007f9bcaa695de /usr/local/MATLAB/R2011b/bin/glnxa64/libmwfl.so+00210398 _ZN2fl4diag15stacktrace_base7capt
ureERKNS0_14thread_contextEm+000158
[ 1] 0x00007f9bcaa6b62d /usr/local/MATLAB/R2011b/bin/glnxa64/libmwfl.so+00218669
[ 2] 0x00007f9bcaa6b8f5 /usr/local/MATLAB/R2011b/bin/glnxa64/libmwfl.so+00219381 _ZN2fl4diag13terminate_logEPKcRKN
S0_14thread_contextEb+000165
[ 3] 0x00007f9bc9a714f5 /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00447733 _ZN2fl4diag13terminate_logEPKcPK8
ucontextb+000085
[ 4] 0x00007f9bc9a6e5b4 /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00435636
[ 5] 0x00007f9bc9a6f333 /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00439091
[ 6] 0x00007f9bc9a6f4c7 /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00439495
[ 7] 0x00007f9bc9a7085f /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00444511
[ 8] 0x00007f9bc9a70a15 /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00444949
[ 9] 0x00007f9bc89f0cb0 /lib/x86_64-linux-gnu/libpthread.so.0+00064688
[ 10] 0x00007f9bc876cb8e /lib/x86_64-linux-gnu/libc.so.6+01346446
[ 11] 0x00007f9b88238bb8 /home/peiyun/schmax3.0/test_im2col/mex_im2colstep.mexa64+00003000
[ 12] 0x00007f9bcb004eea /usr/lib/gcc/x86_64-linux-gnu/4.6.3//libgomp.so+00032490
[ 13] 0x00007f9bc89e8e9a /lib/x86_64-linux-gnu/libpthread.so.0+00032410
[ 14] 0x00007f9bc87164bd /lib/x86_64-linux-gnu/libc.so.6+00992445 clone+000109
顺便说一句,如果他们正在写不同的地址,那么在omp for循环中是否存在有关memcpy的竞争条件?
您的代码中存在多个数据竞争,即:
/* n stands for the size of an image, sz stands for the patch size to extract */
int blockid3, blockid2, blockid;
for (k=0; k<=n[2]-sz[2]; k+=1) {
blockid3 = k * (n[1]-sz[1]+1) * (n[0]-sz[0]+1);
#pragma omp parallel for
for (j=0; j<=n[1]-sz[1]; j+=1) {
blockid2 = j * (n[0]-sz[0]+1); // <--- here
for (i=0; i<=n[0]-sz[0]; i+=1) { // <--- here
blockid = i + blockid2 + blockid3; // <--- here
/* copy single block */
for (m=0; m<sz[2]; m++) { // <--- here
for (l=0; l<sz[1]; l++) { // <--- and here
memcpy(b + blockid*sz[0]*sz[1]*sz[2] + m*sz[0]*sz[1] + l*sz[0], x+(k+m)*n[0]*n[1]+(j+l)*n[0]+i, sz[0]*sizeof(double));
}
}
}
}
}
根据OpenMP blockid2
的规则, i
, blockid
, m
和l
都是隐式共享的,这不是您想要的。 您应该将它们设置为private
,或者最好在并行区域内声明它们,从而使其隐式为私有:
#pragma omp parallel for private(i,m,l,blockid,blockid2)
...
要么
int blockid3;
for (k=0; k<=n[2]-sz[2]; k+=1) {
blockid3 = k * (n[1]-sz[1]+1) * (n[0]-sz[0]+1);
#pragma omp parallel for
for (j=0; j<=n[1]-sz[1]; j+=1) {
int blockid2 = j * (n[0]-sz[0]+1);
for (int i=0; i<=n[0]-sz[0]; i+=1) {
int blockid = i + blockid2 + blockid3;
/* copy single block */
for (int m=0; m<sz[2]; m++) {
for (int l=0; l<sz[1]; l++) {
memcpy(b + blockid*sz[0]*sz[1]*sz[2] + m*sz[0]*sz[1] + l*sz[0], x+(k+m)*n[0]*n[1]+(j+l)*n[0]+i, sz[0]*sizeof(double));
}
}
}
}
}
后者需要兼容C99的编译器(由于声明了循环变量的方式)。 您的GCC 4.6.3需要使用-std=c99
选项来启用C99合规性。 如果没有可用的此类编译器(是否仍普遍使用非C99编译器?),则应添加private(i,l,m)
子句。 您可能还想将并行化移到最外面的循环,以最大程度地减少OpenMP开销。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.