[英]Passing CUDA Random Generator State by reference
通过函数CalculateValue(curandState * localStat)和GetExponential(curandState * localState)中的引用传递随机生成器状态(CUDA工具包3.2 curand.lib)时,以下代码正确吗?
谢谢
__device__ double GetExponential(curandState *localState) {
double u1 = curand_uniform_double(localState); }
__device__ double CalculateValue(curandState *localStat) {
double x = GetExponential(localState);
return x; }
__global__ void RunMonteCarloKernel(curandState *state, double *results) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
/* Copy state to local memory for efficiency */
curandState localState = state[threadIdx.x + blockIdx.x * blockDim.x];
results[i] = CalculateValue(&localState);
/* Copy state back to global memory */
state[threadIdx.x + blockIdx.x * blockDim.x] = localState; }
__global__ void setup_kernel(curandState *state) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
/* Each thread gets different seed, a different sequence number, no offset */
curand_init(i, i, 0, &state[i]); }
int main(void) {
double *devResults;
curandState *devStates;
/* Allocate space for prng states on device */
CUDA_CALL(cudaMalloc((void **)&devStates, totalThreads * sizeof(curandState)));
/* Setup prng states */
setup_kernel<<<totalBlocks, threadsPerBlock>>>(devStates);
for(int i=0; i< 1000; i++)
{
RunMonteCarloKernel(devStates, devResults);
} }
有问题吗? 看起来还可以。
您可能想查看3.2 SDK的MonteCarloCURAND目录中的EstimatePiInlineP示例。 它通过引用使用C ++样式来避免获取局部变量的地址。 您将需要在内核末尾将状态存储回内存中(就像在代码中一样)。
通过C ++参考传递可以清楚地表明该函数可以直接对原始寄存器中的数据进行操作,从而有助于编译器。 如果编译器无法确定所有线程都完全相同地处理指针(即指针上的相同操作),则在GPU中获取本地数组的地址可能会对性能造成不利影响,在这种情况下,编译器会将数组溢出到本地内存中。 可以,但是可能会慢一些。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.