cudaMemcpy2D error with large array

Question

I tried to use cudaMallocPitch and cudaMemcpy2D , but when I tried to use cudaMemcpy2D with large array, I encountered a problem:

Segmentation fault

Here is the runnable source code, with no error.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <iostream>
#include <random>

#define ROW_SIZE 32
#define COL_SIZE 1024

int main()
{
    float ** pfTest;
    pfTest = (float**)malloc(ROW_SIZE * sizeof(float*));
    for (int i = 0; i < ROW_SIZE; i++) {
        pfTest[i] = (float*)malloc(COL_SIZE * sizeof(float));
    }

    std::default_random_engine generator;
    std::uniform_real_distribution<float> distribution;

    for (int y = 0; y < ROW_SIZE; y++) {
        for (int x = 0; x < COL_SIZE; x++) {
            pfTest[y][x] = distribution(generator);
        }
    }   

    float *dev_Test;
    size_t pitch;
    cudaMallocPitch(&dev_Test, &pitch, COL_SIZE * sizeof(float), ROW_SIZE);
    cudaMemcpy2D(dev_Test, pitch, pfTest, COL_SIZE * sizeof(float), COL_SIZE * sizeof(float),  ROW_SIZE, cudaMemcpyHostToDevice);
    printf("%s\n", cudaGetErrorString(cudaGetLastError()));

    return 0;
}

As you can see, there's no problem at all. But, when I tried to extend COL_SIZE to around 500,000 (exactly, 524288), it crashes with segmentation fault.

Any help as to the source of the problem?

Answer 1

cudaMemcpy2D can only be used for copying pitched linear memory. Your source array is not pitched linear memory, it is an array of pointers. This is not supported and is the source of the segfault.

Try something like this:

float*  buffer;
float** pfTest;
const size_t buffer_pitch = size_t(COL_SIZE) * sizeof(float); 
buffer = (float*)malloc(size_t(ROW_SIZE) * buffer_pitch);
pfTest = (float**)malloc(ROW_SIZE * sizeof(float*));
for (size_t i = 0; i < ROW_SIZE; i++) {
    pfTest[i] = buffer + i * size_t(COL_SIZE);
}

// ...

cudaMallocPitch(&dev_Test, &pitch, buffer_pitch, ROW_SIZE);
cudaMemcpy2D(dev_Test, pitch, buffer, buffer_pitch, 
               buffer_pitch, ROW_SIZE, cudaMemcpyHostToDevice);

[Note: written in browser, never tested or compiled, use at own risk]

ie store the data to be copied in a single contiguous memory allocation which can act as a pitched linear source for cudaMemcpy2D. If you insist on using [][] style indexing on the host, then you have to pay the penalty of having an additional array of pointers to store alongside the data. Note that isn't actually necessary, and you could just directly index into buffer and achieve the same result, while saving memory at the same time.

cudaMemcpy2D error with large array

Question

1 answers

solution1
4 ACCPTED

cudaMemcpy2D error with large array

Question

1 answers

solution1 4 ACCPTED

solution1
4 ACCPTED