[英]How to allocate arrays of arrays in structure with CUDA Fortran?
With CUDA, I'm trying to allocate arrays in a structure, but I'm having an issue and I don't know why. 使用CUDA,我试图在结构中分配数组,但是遇到问题,我不知道为什么。 So here is a short code (stored in a file called
struct.cuf
) that describe my problem. 因此,这是描述我的问题的简短代码(存储在名为
struct.cuf
的文件中)。 I'm compiling with the PGI 16.10 version
, and I'm using the following options : -O3 -Mcuda=cc60 -tp=x64 struct.cuf -o struct_out
我正在使用
PGI 16.10 version
编译,并且正在使用以下选项: -O3 -Mcuda=cc60 -tp=x64 struct.cuf -o struct_out
module structure
contains
type mytype
integer :: alpha,beta,gamma
real,dimension(:),pointer :: a
end type mytype
type mytypeDevice
integer :: alpha,beta,gamma
real,dimension(:),pointer,device :: a
end type mytypeDevice
end module structure
program main
use cudafor
use structure
type(mytype) :: T(3)
type(mytypeDevice),device :: T_Device(3)
! For the host
do i=1,3
allocate(T(i)%a(10))
end do
T(1)%a=1; T(2)%a=2; T(3)%a=3
! For the device
print *, 'Everything from now is ok'
do i=1,3
allocate(T_Device(i)%a(10))
end do
!do i=1,3
! T_Device(i)%a=T(i)%a
!end do
end program main
The output error : 输出错误:
Everything from now is ok
Segmentation fault
What I am doing wrong here ? 我在这里做错了什么?
The only solution I found (and working) is to stored the values in differents arrays and transfers them to the GPU, but it's very "Heavy". 我发现(并且正在工作)的唯一解决方案是将值存储在differents数组中并将它们传输到GPU,但这非常“繁重”。 Mostly if I use a lot of structures like mytype.
通常,如果我使用很多结构,例如mytype。
EDIT : Code has been modified to use Vladimir F's solution. 编辑:代码已被修改为使用Vladimir F的解决方案。 If I remove the
device
attribute from T_Device(3)
declaration, then allocation seems ok and giving values too (commented lines below allocation). 如果我从
T_Device(3)
声明中删除了device
属性,则分配似乎还可以,并且也提供了值(分配下方的注释行)。 But I need that device
attribute for T_Device(3)
, because I'm gonna use it in kernels. 但是我需要
T_Device(3)
device
属性,因为我将在内核中使用它。
Thanks ! 谢谢 !
I think you need a device pointer 我认为您需要一个设备指针
type mytype_device
...
real,dimension(:),pointer, device :: a
end type
Never used CUDA Fortran in my life, but it seems obvious enough to wager. 我一生中从未使用过CUDA Fortran,但似乎可以下注。
The problem here is how you have declared T_Device
. 这里的问题是您如何声明
T_Device
。 To use host side allocation you first populate a host memory copy of the device structure, and then copy it to device memory. 要使用主机端分配,您首先要填充设备结构的主机内存副本,然后将其复制到设备内存中。 This:
这个:
type(mytypeDevice) :: T_Device(3)
do i=1,3
allocate(T_Device(i)%a(10))
end do
will work correctly. 将正常工作。 This is a very standard design pattern in C++ based CUDA code, and the principle here is identical.
这是基于C ++的CUDA代码中非常标准的设计模式,此处的原理相同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.