将各种偏移用于输入数据时，CUDA 内核启动失败

Question

My code is giving an error message and I am trying to track down the cause of it.我的代码给出了一条错误消息，我正试图找出它的原因。 To make it easier to find the problem, I have stripped away code that apparently is not relevant to causing the error message.为了更容易找到问题，我删除了显然与导致错误消息无关的代码。 If you can tell me why the following simple code produces an error message, then I think I should be able to fix my original code:如果你能告诉我为什么下面的简单代码会产生错误信息，那么我想我应该能够修复我的原始代码：

#include "cuComplex.h"
#include <cutil.h>

__device__ void compute_energy(void *data, int isample, int nsamples) {
  cuDoubleComplex * const nminusarray          = (cuDoubleComplex*)data;
  cuDoubleComplex * const f                    = (cuDoubleComplex*)(nminusarray+101);
  double          * const abs_est_errorrow_all = (double*)(f+3);
  double          * const rel_est_errorrow_all = (double*)(abs_est_errorrow_all+nsamples*51);
  int             * const iid_all              = (int*)(rel_est_errorrow_all+nsamples*51);
  int             * const iiu_all              = (int*)(iid_all+nsamples*21);
  int             * const piv_all              = (int*)(iiu_all+nsamples*21);
  cuDoubleComplex * const energyrow_all        = (cuDoubleComplex*)(piv_all+nsamples*12);
  cuDoubleComplex * const refinedenergyrow_all = (cuDoubleComplex*)(energyrow_all+nsamples*51);
  cuDoubleComplex * const btplus_all           = (cuDoubleComplex*)(refinedenergyrow_all+nsamples*51);

  cuDoubleComplex * const btplus           = btplus_all+isample*21021;

  btplus[0] = make_cuDoubleComplex(0.0, 0.0);
}

__global__ void computeLamHeight(void *data, int nlambda) {
  compute_energy(data, blockIdx.x, nlambda);
}

int main(int argc, char *argv[]) {
  void *device_data;

  CUT_DEVICE_INIT(argc, argv);
  CUDA_SAFE_CALL(cudaMalloc(&device_data, 184465640));
  computeLamHeight<<<dim3(101, 1, 1), dim3(512, 1, 1), 45000>>>(device_data, 101);
  CUDA_SAFE_CALL(cudaThreadSynchronize());
}

I am using a GeForce GTX 480 and I am compiling the code like so:我使用的是 GeForce GTX 480，我正在编译代码，如下所示：

nvcc -L /soft/cuda-sdk/4.0.17/C/lib -I /soft/cuda-sdk/4.0.17/C/common/inc -lcutil_x86_64 -arch sm_13 -O3 -Xopencc "-Wall" Main.cu

The output is:输出是：

Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 31 : unspecified launch failure.

EDIT: I have now further simplified the code.编辑：我现在进一步简化了代码。 The following simpler code still produces the error message:以下更简单的代码仍会产生错误消息：

#include <cutil.h>

__global__ void compute_energy(void *data) {
  *(double*)((int*)data+101) = 0.0;
}

int main(int argc, char *argv[]) {
  void *device_data;

  CUT_DEVICE_INIT(argc, argv);
  CUDA_SAFE_CALL(cudaMalloc(&device_data, 101*sizeof(int)+sizeof(double)));
  compute_energy<<<dim3(1, 1, 1), dim3(1, 1, 1)>>>(device_data);
  CUDA_SAFE_CALL(cudaThreadSynchronize());
}

Now it is easy to see that the offset should be valid.现在很容易看出偏移量应该是有效的。 I tried running cuda-memcheck and it says the following:我尝试运行 cuda-memcheck 并显示以下内容：

========= CUDA-MEMCHECK
Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 13 : unspecified launch failure.
========= Invalid __global__ write of size 8
=========     at 0x00000020 in compute_energy
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x200200194 is misaligned
=========
========= ERROR SUMMARY: 1 error

I tried searching the internet to find what is meant by the address being misaligned, but I failed to find an explanation.我尝试在互联网上搜索以查找地址未对齐是什么意思，但找不到解释。 What is the deal?交易是什么？

Answer 1

It was very hard to parse your original code with all of those magic constants, but your updated repro case makes the problem immediately obvious.使用所有这些魔法常量解析原始代码非常困难，但是您更新后的重现案例使问题立即变得明显。 The GPU architecture requires all pointers to be aligned to word boundaries. GPU 架构要求所有指针与字边界对齐。 Your kernel contains a pointer access which is not correctly word aligned.您的内核包含未正确字对齐的指针访问。 Doubles are an 64 bit type, and your addressing is not aligned to an even 64 bit boundary.双精度型是 64 位类型，您的寻址未与偶数 64 位边界对齐。 This:这个：

*(double*)((int*)data+100) = 0.0; // 50th double

or this:或这个：

*(double*)((int*)data+102) = 0.0; // 51st double

are both legal.都是合法的。 This:这个：

*(double*)((int*)data+101) = 0.0; // not aligned to a 64 bit boundary

is not.不是。

Answer 2

错误表示内存访问越界，请检查偏移值。

将各种偏移用于输入数据时，CUDA 内核启动失败

问题描述

2 个解决方案

解决方案1
8 已采纳 2012-08-06 17:13:05

解决方案2
2 2012-08-06 04:35:07

将各种偏移用于输入数据时，CUDA 内核启动失败

问题描述

2 个解决方案

解决方案1 8 已采纳 2012-08-06 17:13:05

解决方案2 2 2012-08-06 04:35:07

解决方案1
8 已采纳 2012-08-06 17:13:05

解决方案2
2 2012-08-06 04:35:07