简体   繁体   English

Cudamalloc的神秘Seg断层

[英]Mysterious Seg Faults with Cudamalloc

Can anyone help me to understand why the following code causes a segmentation fault? 任何人都可以帮助我理解为什么以下代码导致分段错误? Likewise, can anyone help me understand why swapping out the two lines labelled "bad" for the two lines labelled "good" does not result in a segmentation fault? 同样,任何人都可以帮助我理解为什么换掉标有“好”的两行标记为“坏”的两行并不会导致分段错误?

Note that the seg fault seems to occur at the cudaMalloc line; 注意,seg故障似乎发生在cudaMalloc线上; if I comment that out I also do not see a segmentation fault. 如果我发表评论,我也没有看到分段错误。 These allocations seem to be stepping on each other, but I don't understand how. 这些分配似乎互相踩踏,但我不明白如何。

The intent of the code is to set up three structures: h_P on the host, which will be populated by a CPU routine d_P on the device, which will be populated by a GPU routine h_P_copy on the host, which will be populated by copying the GPU data structure back in. 代码的目的是设置三个结构:主机上的h_P,它将由设备上的CPU例程d_P填充,该例程将由主机上的GPU例程h_P_copy填充,该例程将通过复制GPU数据结构重新进入。

That way I can verify correct behavior and benchmark one vs the other. 这样我就可以验证正确的行为并将其与另一个进行基准测试。
All of those are, indeed, four-dimensional arrays. 所有这些都是四维阵列。

(If it matters, the card in question is a GTX 580, using nvcc 4.2 under SUSE Linux) (如果重要,有问题的卡是GTX 580,在SUSE Linux下使用nvcc 4.2)

#define NUM_STATES              32
#define NUM_MEMORY              16

int main( int argc, char** argv) {

        // allocate and create P matrix
        int P_size      = sizeof(float) * NUM_STATES * NUM_STATES * NUM_MEMORY * NUM_MEMORY;
        // float *h_P      = (float*) malloc (P_size);  **good**
        // float *h_P_copy = (float*) malloc (P_size);  **good**
        float h_P[P_size];                            //  **bad**
        float h_P_copy[P_size];                       //  **bad**
        float *d_P;
        cudaMalloc( (void**) &d_P, P_size);
        cudaMemset( d_P, 0.0, P_size);

}

This is likely due to stack corruption of some sort. 这可能是由于某种堆栈损坏造成的。

Notes: 笔记:

  • The "good" lines allocate out of the system heap, the "bad" lines allocate stack storage. “好”行分配出系统堆,“坏”行分配堆栈存储。
  • Normally the amount you can allocate from the stack is quite a bit smaller than what you can allocate from the heap. 通常,您可以从堆栈中分配的数量远远小于您可以从堆中分配的数量。
  • The "good" and "bad" declarations are not reserving the same amount of float storage. “好”和“坏”声明不保留相同数量的float存储。 The "bad" are allocating 4x as much float storage. “坏”分配float存储的4倍。
  • Finally, cudaMemset , just like memset , is setting bytes and expects a unsigned char quantity, not a float (0.0) quantity. 最后, cudaMemset ,就像memset一样,设置字节并期望一个无符号的char数量,而不是float(0.0)数量。

Since the cudaMalloc line is the first one that actually "uses" (attempts to set) any of the allocated stack storage in the "bad" case, it is where the seg fault occurs. 由于cudaMalloc线是第一个在“坏”情况下实际“使用”(尝试设置)任何分配的堆栈存储的线路,因此它就是发生seg故障的地方。 If you added an additional declaration like so: 如果你添加了一个额外的声明:

    float *d_P;
    float myval;  //add
    myval = 0.0f; //add2
    cudaMalloc( (void**) &d_P, P_size);

I suspect you might see the seg fault occur on the "add2" line, as it would then be the first to make use of the corrupted stack storage. 我怀疑你可能会在“add2”行看到seg故障,因为它将是第一个利用损坏的堆栈存储。

The two lines labeled good are allocating 262144 * sizeof(float) bytes. 标记为good的两行是分配262144 * sizeof(float)字节。 The two lines labeled bad are allocating 262144 * sizeof(float) * sizeof(float) bytes. 标记为bad的两行是分配262144 * sizeof(float)* sizeof(float)字节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM