简体   繁体   中英

OpenMP not starting threads in one machine but works OK in another one running the same OS

Recently I've had success in paralellizing a program (which is somewhat big) written in Fortran with some libraries written in C (most notably, UMFPACK). We compiled those with Intel's C Compiler and Intel's Fortran Compiler (icc and ifort) 14.0. We run Ubuntu 12.04.3.

I made all routines thread-safe and used the code below to perform the paralellization using OpenMP:

!$omp parallel do default(shared) private(gs,ibk,ij) schedule(dynamic)
do  ibk=1,numcell

call CellGaussPoints(ibk,numcell,nquado,numq,numgauss, &
    xc,noCell,gauss,gs)

    do  ij=1,numgauss

        gs_3D(ibk,1,ij)=gs(1,ij)
        gs_3D(ibk,2,ij)=gs(2,ij)
        gs_3D(ibk,3,ij)=gs(3,ij)
        gs_3D(ibk,4,ij)=gs(4,ij)

        call SearchMaterial(tree3,my_array0,node,gs_3D(ibk,1,ij),gs_3D(ibk,2,ij),numnode,mat_2D(ibk,ij),nf,numd,elements)

    end do

end do
!$omp end parallel do

It works well when compiled with -openmp. But not in every PC... The gs_3D is a 3 dimensional array used to store SearchMaterial's results.

I have a Core i5-2400 and tested both with a VMWare Virtual Machine running Linux (Windows Host) and my native Linux install. Worked fine on both. But on another PC (a Core i7-3860X), also running Ubuntu 12.04.3, with the same compiler and libraries installed, it will only run using one thread. Compile options are all the same. I even tried running the binary I compiled in my PC on the other one.

Not only that but using OpenBLAS' OpenMP implementation worked fine on my native Linux installation but not on my Virtual Machine and the i7-3860X).

After some research, which produced nothing, I decided to ask for help.

(OMP_NUM_THREADS was properly set in all these cases)

ulimit -a returns the following

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63687
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63687
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I usuall do a ulimit -s unlimited before running the program, since I get a segmentation fault otherwise.

OMP_THREAD_LIMIT was not set on the machine my code doesn't work.

EDIT: as for the BLAS problem, I discovered that compiling it without processor affinity makes it use all cores. My program, on the other hand, still doesn't work on the i7

Try to set the environment varialble OMP_DYNAMIC to FALSE . When it is TRUE , the number of threads will be adjusted by the runtime environment if it thinks the CPU is too busy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM