One MPI code, I am trying to parallelize a simple loop of it with openacc,and the output is not expected. Here, the loop has a call and I add a 'acc routine seq' in the subroutine. If I manually make this call inline and delete the subroutine, the result will be right. Do I use the OpenACC "routine" directive correctly? or other wrong?
MPI version: openmpi4.0.5
HPC SDK 20.11
CUDA Version: 10.2
!The main program
program test
use simple
use mpi
implicit none
integer :: i,id,size,ierr,k,n1
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,id,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
allocate(type1(m))
do i=1,m
allocate(type1(i)%member(n))
type1(i)%member=-1
type1(i)%member(i)=i
enddo
!$acc update device(m,n)
do k=1,m
n1=0
allocate(dev_mol(k:2*k))
dev_mol=type1(k)%member(k:2*k)
!$acc update device(dev_mol(k:2*k))
!$acc parallel copy(n1) firstprivate(k)
!$acc loop independent
do i=k,2*k
call test1(k,n1,i)
enddo
!$acc end parallel
!$acc update self(dev_mol(k:2*k))
type1(k)%member(k:2*k)=dev_mol
write(*,"('k=',I3,' n1=',I2)") k,n1
deallocate(dev_mol)
enddo
do i=1,m
write(*,"('i=',I2,' member=',I3)") i,type1(i)%member(i)
deallocate(type1(i)%member)
enddo
deallocate(type1)
call MPI_Barrier(MPI_COMM_WORLD,ierr)
call MPI_Finalize(ierr)
end
!Here is the module
module simple
implicit none
integer :: m=5,n=2**15
integer,parameter :: p1=15
integer,allocatable :: dev_mol(:)
type type_related
integer,allocatable :: member(:)
end type
type(type_related),allocatable :: type1(:)
!$acc declare create(m,n,dev_mol)
!$acc declare copyin(p1)
contains
subroutine test1(k,n1,i)
implicit none
integer :: k,n1,i
!$acc routine seq
if(dev_mol(i)>0) then
!write(*,*) 'gpu',k,n1,i
n1=dev_mol(i)
dev_mol(i)=p1
else
if(i==k)write(*,*) 'err',i,dev_mol(i)
endif
end
end
compile command:mpif90 test.f90 -o test
run command:mpirun -n 1./test
result as follow:
k= 1 n1= 1
k= 2 n1= 2
k= 3 n1= 3
k= 4 n1= 4
k= 5 n1= 5
i= 1 member= 15
i= 2 member= 15
i= 3 member= 15
i= 4 member= 15
i= 5 member= 15
compile command:mpif90 test.f90 -o test -ta=tesla:cuda10.2 -Minfo=accel
run command:mpirun -n 1./test
the error result as follow:
k= 1 n1= 0
k= 2 n1= 0
k= 3 n1= 0
k= 4 n1= 0
k= 5 n1= 0
i= 1 member= 1
i= 2 member= 2
i= 3 member= 3
i= 4 member= 4
i= 5 member= 5
The problem is with "i" being passed by reference (default with Fortran). Simplest solution is to pass it by value:
contains
subroutine test1(k,n1,i)
implicit none
integer, value :: i
integer :: n1, k
Now there is a small compiler bug in that since "i" is the loop index variable so should be implicitly privatizing the variable. However since it's being passed by reference, this causes it to be made shared. We'll get this fixed in a future compiler version. Though passing scalars by value when possible is generally advisible.
Example run with the update:
% mpif90 test2.f90 -acc -Minfo=accel -V21.2 ; mpirun -np 1 a.out
test1:
16, Generating acc routine seq
Generating Tesla code
test:
48, Generating update device(m,n)
53, Generating update device(dev_mol(k:k*2))
54, Generating copy(n1) [if not already present]
Generating Tesla code
56, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
60, Generating update self(dev_mol(k:k*2))
k= 1 n1= 1
k= 2 n1= 2
k= 3 n1= 3
k= 4 n1= 4
k= 5 n1= 5
i= 1 member= 15
i= 2 member= 15
i= 3 member= 15
i= 4 member= 15
i= 5 member= 15
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.