简体   繁体   English

关于accroutine的一些问题

[英]Some questions about acc routine

One MPI code, I am trying to parallelize a simple loop of it with openacc,and the output is not expected.一个 MPI 代码,我试图用 openacc 并行化它的一个简单循环,并且预计不会出现 output。 Here, the loop has a call and I add a 'acc routine seq' in the subroutine.在这里,循环有一个调用,我在子例程中添加了一个“accroutine seq”。 If I manually make this call inline and delete the subroutine, the result will be right.如果我手动将此调用内联并删除子例程,结果将是正确的。 Do I use the OpenACC "routine" directive correctly?我是否正确使用 OpenACC“例程”指令? or other wrong?或其他错误?

  • Runtime environment运行环境

MPI version: openmpi4.0.5 MPI版本:openmpi4.0.5
HPC SDK 20.11高性能计算 SDK 20.11
CUDA Version: 10.2 CUDA 版本:10.2

!The main program
program test
  use simple
  use mpi
  implicit none
  integer :: i,id,size,ierr,k,n1
  call MPI_INIT(ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD,id,ierr)
  call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)

  allocate(type1(m))
  do i=1,m
    allocate(type1(i)%member(n))
    type1(i)%member=-1
    type1(i)%member(i)=i
  enddo
  
  !$acc update device(m,n)
  do k=1,m
    n1=0
    allocate(dev_mol(k:2*k))
    dev_mol=type1(k)%member(k:2*k)
    !$acc update device(dev_mol(k:2*k))
    !$acc parallel copy(n1) firstprivate(k)
    !$acc loop independent
    do i=k,2*k
      call test1(k,n1,i)
    enddo
    !$acc end parallel
    !$acc update self(dev_mol(k:2*k))
    type1(k)%member(k:2*k)=dev_mol
    write(*,"('k=',I3,' n1=',I2)") k,n1
    deallocate(dev_mol)
  enddo
  
  do i=1,m
    write(*,"('i=',I2,' member=',I3)") i,type1(i)%member(i)
    deallocate(type1(i)%member)
  enddo
  deallocate(type1)
  call MPI_Barrier(MPI_COMM_WORLD,ierr)
  call MPI_Finalize(ierr)
end


!Here is the module
module simple
  implicit none
  integer :: m=5,n=2**15
  integer,parameter :: p1=15
  integer,allocatable :: dev_mol(:)
  type type_related
    integer,allocatable :: member(:)
  end type
  type(type_related),allocatable :: type1(:)
  
  !$acc declare create(m,n,dev_mol)
  !$acc declare copyin(p1)
  contains
    subroutine test1(k,n1,i)
      implicit none
      integer :: k,n1,i
      !$acc routine seq
      if(dev_mol(i)>0) then
        !write(*,*) 'gpu',k,n1,i
        n1=dev_mol(i)
        dev_mol(i)=p1
      else
        if(i==k)write(*,*) 'err',i,dev_mol(i)
      endif
    end
end
  • MPI MPI

compile command:mpif90 test.f90 -o test编译命令:mpif90 test.f90 -o test
run command:mpirun -n 1./test运行命令:mpirun -n 1./test
result as follow:结果如下:

k=  1 n1= 1
k=  2 n1= 2
k=  3 n1= 3
k=  4 n1= 4
k=  5 n1= 5
i= 1 member= 15
i= 2 member= 15
i= 3 member= 15
i= 4 member= 15
i= 5 member= 15
  • MPI+OpenACC MPI+OpenACC

compile command:mpif90 test.f90 -o test -ta=tesla:cuda10.2 -Minfo=accel编译命令:mpif90 test.f90 -o test -ta=tesla:cuda10.2 -Minfo=accel
run command:mpirun -n 1./test运行命令:mpirun -n 1./test
the error result as follow:错误结果如下:

k=  1 n1= 0
k=  2 n1= 0
k=  3 n1= 0
k=  4 n1= 0
k=  5 n1= 0
i= 1 member= 1
i= 2 member= 2
i= 3 member= 3
i= 4 member= 4
i= 5 member= 5

The problem is with "i" being passed by reference (default with Fortran).问题在于“i”通过引用传递(默认使用 Fortran)。 Simplest solution is to pass it by value:最简单的解决方案是按值传递:

  contains
    subroutine test1(k,n1,i)
      implicit none
      integer, value :: i
      integer :: n1, k

Now there is a small compiler bug in that since "i" is the loop index variable so should be implicitly privatizing the variable.现在有一个小的编译器错误,因为“i”是循环索引变量,所以应该隐式地将变量私有化。 However since it's being passed by reference, this causes it to be made shared.然而,由于它是通过引用传递的,这会导致它被共享。 We'll get this fixed in a future compiler version.我们将在未来的编译器版本中修复此问题。 Though passing scalars by value when possible is generally advisible.尽管通常建议尽可能按值传递标量。

Example run with the update:使用更新运行的示例:

% mpif90 test2.f90 -acc -Minfo=accel -V21.2 ; mpirun -np 1 a.out
test1:
     16, Generating acc routine seq
         Generating Tesla code
test:
     48, Generating update device(m,n)
     53, Generating update device(dev_mol(k:k*2))
     54, Generating copy(n1) [if not already present]
         Generating Tesla code
         56, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     60, Generating update self(dev_mol(k:k*2))
k=  1 n1= 1
k=  2 n1= 2
k=  3 n1= 3
k=  4 n1= 4
k=  5 n1= 5
i= 1 member= 15
i= 2 member= 15
i= 3 member= 15
i= 4 member= 15
i= 5 member= 15

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM