CUDA - cudaMallocPitch and cudaMemcpy2D use, Error: InvalidValue, InvalidPitchValue

Question

okay so I'm trying to get a 2D array for cuda to work on, but it's becoming a pain. the error's are in the title and occur at the cudaMemcpy2D. I think the problem is obvious to trained eyes. Thank you in advance for any help, I've stepped ahead of my class which are currently learning Pointers.

#include <cuda_runtime.h>
#include <iostream>
#pragma comment (lib, "cudart")

/* Program purpose: pass a 10 x 10 matrix and multiply it by another 10x10 matrix */

float matrix1_host[100][100];
float matrix2_host[100][100];

float* matrix1_device;
float* matrix2_device;  
size_t pitch;
cudaError_t err;

__global__ void addMatrix(float* matrix1_device,float* matrix2_device, size_t pitch){
    // How this works
    // first we start to cycle through the rows by using the thread's ID
    // then we calculate an address from the address of a point in the row, by adding the pitch (size of each row) and  * it by
    // the amount of rows we've already completed, then we can use that address of somewhere at a start of a row to get the colums 
    // in the row with a normal array grab. 

    int r = threadIdx.x;

        float* rowofMat1 = (float*)((char*)matrix1_device + r * pitch);
        float* rowofMat2 = (float*)((char*)matrix2_device + r * pitch);
        for (int c = 0; c < 100; ++c) {
             rowofMat1[c] += rowofMat2[c];
        }

}

void initCuda(){
    err = cudaMallocPitch((void**)matrix1_device, &pitch, 100 * sizeof(float), 100);
    err = cudaMallocPitch((void**)matrix2_device, &pitch, 100 * sizeof(float), 100); 
    //err = cudaMemcpy(matrix1_device, matrix1_host, 100*100*sizeof(float), cudaMemcpyHostToDevice);
    //err = cudaMemcpy(matrix2_device, matrix2_host, 100*100*sizeof(float), cudaMemcpyHostToDevice);
    err = cudaMemcpy2D(matrix1_device, 100*sizeof(float), matrix1_host, pitch, 100*sizeof(float), 100, cudaMemcpyHostToDevice);
    err = cudaMemcpy2D(matrix2_device, 100*sizeof(float), matrix2_host, pitch, 100*sizeof(float), 100, cudaMemcpyHostToDevice);
}

void populateArrays(){
    for(int x = 0; x < 100; x++){
        for(int y = 0; y < 100; y++){
            matrix1_host[x][y] = (float) x + y;
            matrix2_host[y][x] = (float) x + y;
        }
    }
}

void runCuda(){
    dim3 dimBlock ( 100 );
    dim3 dimGrid ( 1 );
    addMatrix<<<dimGrid, dimBlock>>>(matrix1_device, matrix2_device, 100*sizeof(float)); 
    //err = cudaMemcpy(matrix1_host, matrix1_device, 100*100*sizeof(float), cudaMemcpyDeviceToHost);
    err = cudaMemcpy2D(matrix1_host, 100*sizeof(float), matrix1_device, pitch, 100*sizeof(float),100, cudaMemcpyDeviceToHost);
    //cudaMemcpy(matrix1_host, matrix1_device, 100*100*sizeof(float), cudaMemcpyDeviceToHost);
}

void cleanCuda(){
    err = cudaFree(matrix1_device);
    err = cudaFree(matrix2_device);

    err = cudaDeviceReset();
}


int main(){
    populateArrays();
    initCuda();
    runCuda();
    cleanCuda();
    std::cout << cudaGetErrorString(cudaGetLastError());
    system("pause");
    return 0;
}

Answer 1

First of all, in general you should have a separate pitch variable for matrix1 and matrix2. In this case they will be the same value returned from the API call to cudaMallocPitch , but in the general case they may not be.

In your cudaMemcpy2D line, the second parameter to the call is the destination pitch. This is just the pitch value that was returned when you did the cudaMallocPitch call for this particular destination matrix (ie. the first parameter).

The fourth parameter is the source pitch. Since this was allocated with an ordinary host allocation, it has no pitch other than its width in bytes.

So you have your second and fourth parameters swapped.

so instead of this:

err = cudaMemcpy2D(matrix1_device, 100*sizeof(float), matrix1_host, pitch, 100*sizeof(float), 100, cudaMemcpyHostToDevice);

try this:

err = cudaMemcpy2D(matrix1_device, pitch, matrix1_host, 100*sizeof(float), 100*sizeof(float), 100, cudaMemcpyHostToDevice);

and similarly for the second call to cudaMemcpy2D . The third call is actually OK since it's going in the opposite direction, the source and destination matrices are swapped, so they line up with your pitch parameters correctly.

CUDA - cudaMallocPitch and cudaMemcpy2D use, Error: InvalidValue, InvalidPitchValue

Question

1 answers

solution1
3 ACCPTED 2013-03-15 04:53:42

CUDA - cudaMallocPitch and cudaMemcpy2D use, Error: InvalidValue, InvalidPitchValue

Question

1 answers

solution1 3 ACCPTED 2013-03-15 04:53:42

solution1
3 ACCPTED 2013-03-15 04:53:42