[英]segmentation fault on exit of openmp code
我有一個使用openmp的CPP代碼。 它鏈接到一個fortran90代碼。 如果僅使用一個線程運行,則一切正常。 如果運行的線程數與1不同,則退出cpp部分時會出現分段錯誤。 代碼的結果是准確的,沒有任何錯誤。 它運行平穩,直到需要退出為止。 與openmp相關的代碼部分是:
#pragma omp parallel for shared(even_phi,odd_phi,odd_divisor,odd_start_index,odd_iter_index) private(ii,jj,kk,cc,io,pp,f1,f2,f3,f4,f5,f6,ff,tmp_phi) schedule(static)
for (kk=1; kk<nz-1; kk++)
{
cc = (kk-1)*(ny-2);
for (jj=1; jj<ny-1; jj++)
{
io = odd_start_index[cc];
pp = odd_iter_index[cc++];
for (ii=io; ii<maxElem; ii++)
{
f1 = even_phi[pp-odown];
f2 = even_phi[pp-oright];
f3 = even_phi[pp];
tmp_phi = odd_phi[pp];
f4 = even_phi[pp+1];
f5 = even_phi[pp+oleft];
f6 = even_phi[pp+oup];
ff = f1+f2+f3+f4+f5+f6;
odd_phi[pp] = odd_divisor[pp]*ff + c2*tmp_phi;
pp++;
}
}
}
這是一個標准的數值求解器代碼。 在沒有openmp且OMP_NUM_THREADS = 1的情況下,它也可以完美工作。 如果用更多的線程執行,則在幾乎完成正常的執行之后,Valgrinds說:
==23723== Thread 20:
==23723== Jump to the invalid address stated on the next line
==23723== at 0x2A6EBBB8: ???
==23723== by 0x2A6EA515: ???
==23723== Address 0x2a6ebbb8 is not stack'd, malloc'd or (recently) free'd
==23723==
==23723==
==23723== Process terminating with default action of signal 11 (SIGSEGV)
==23723== Access not within mapped region at address 0x2A6EBBB8
==23723== at 0x2A6EBBB8: ???
==23723== by 0x2A6EA515: ???
==23723== If you believe this happened as a result of a stack
==23723== overflow in your program's main thread (unlikely but
==23723== possible), you can try to increase the size of the
==23723== main thread stack using the --main-stacksize= flag.
==23723== The main thread stack size used in this run was 1048576.
==23723==
==23723== HEAP SUMMARY:
==23723== in use at exit: 632,995,339 bytes in 101 blocks
==23723== total heap usage: 10,071 allocs, 9,970 frees, 1,257,933,743 bytes allocated
==23723==
==23723== Thread 1:
==23723== 6,992 bytes in 23 blocks are possibly lost in loss record 47 of 74
==23723== at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==23723== by 0x35A0E11812: _dl_allocate_tls (dl-tls.c:300)
==23723== by 0x35A1E07068: pthread_create@@GLIBC_2.2.5 (allocatestack.c:571)
==23723== by 0x2A6EA981: ???
==23723== by 0x2A4C666E: ???
==23723== by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723== by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723== by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723==
==23723== 30,276 bytes in 1 blocks are definitely lost in loss record 50 of 74
==23723== at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723== by 0x2A4C6394: ???
==23723== by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723== by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723== by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723==
==23723== 30,276 bytes in 1 blocks are definitely lost in loss record 51 of 74
==23723== at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723== by 0x2A4C63BF: ???
==23723== by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723== by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723== by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723==
==23723== 30,276 bytes in 1 blocks are definitely lost in loss record 52 of 74
==23723== at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723== by 0x2A4C63EA: ???
==23723== by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723== by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723== by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723==
==23723== 30,276 bytes in 1 blocks are definitely lost in loss record 53 of 74
==23723== at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723== by 0x2A4C6415: ???
==23723== by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723== by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723== by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723==
==23723== 39,232 bytes in 1 blocks are definitely lost in loss record 57 of 74
==23723== at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723== by 0x2A4C6369: ???
==23723== by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723== by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723== by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723==
==23723== LEAK SUMMARY:
==23723== definitely lost: 160,336 bytes in 5 blocks
==23723== indirectly lost: 0 bytes in 0 blocks
==23723== possibly lost: 6,992 bytes in 23 blocks
==23723== still reachable: 632,828,011 bytes in 73 blocks
==23723== suppressed: 0 bytes in 0 blocks
==23723== Reachable blocks (those to which a pointer was found) are not shown.
==23723== To see them, rerun with: --leak-check=full --show-reachable=yes
==23723==
==23723== For counts of detected and suppressed errors, rerun with: -v
==23723== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 6 from 6)
gdb說:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5a04700 (LWP 23837)]
0x00007ffff7024bc2 in ?? ()
Missing separate debuginfos, use: debuginfo-install libgcc-4.4.6-4.el6.x86_64 libgfortran-4.4.6-4.el6.x86_64 libgomp-4.4.6-4.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64
這顯然無濟於事。 我一直在玩GOMP_STACKSIZE和線程數,以為我可能有堆棧大小問題,但無濟於事。
我想念一些東西。 也許有些愚蠢。 並且找不到它。
這是GCC中的錯誤。 我在GCC上發現了一個有關與使用openmp和iso_c_binding模塊有關的問題的錯誤。 之后,我使用intel編譯器編譯並執行了代碼,沒有任何問題。
我的代碼很長,並且不知道如何隔離有問題的部分以重現錯誤並進行報告。 會盡力做到這一點。
我正在使用gcc(GCC)4.4.6 20120305(Red Hat 4.4.6-4),CentOS版本6.3(最終版)。
我將其標記為答案,如果以后發現更多有用的內容,我會將其張貼在這里,因為它可能對其他人有用。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.