I tried to use cudaMallocPitch
and cudaMemcpy2D
, but when I tried to use cudaMemcpy2D
with large array, I encountered a problem:
Segmentation fault
Here is the runnable source code, with no error.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <iostream>
#include <random>
#define ROW_SIZE 32
#define COL_SIZE 1024
int main()
{
float ** pfTest;
pfTest = (float**)malloc(ROW_SIZE * sizeof(float*));
for (int i = 0; i < ROW_SIZE; i++) {
pfTest[i] = (float*)malloc(COL_SIZE * sizeof(float));
}
std::default_random_engine generator;
std::uniform_real_distribution<float> distribution;
for (int y = 0; y < ROW_SIZE; y++) {
for (int x = 0; x < COL_SIZE; x++) {
pfTest[y][x] = distribution(generator);
}
}
float *dev_Test;
size_t pitch;
cudaMallocPitch(&dev_Test, &pitch, COL_SIZE * sizeof(float), ROW_SIZE);
cudaMemcpy2D(dev_Test, pitch, pfTest, COL_SIZE * sizeof(float), COL_SIZE * sizeof(float), ROW_SIZE, cudaMemcpyHostToDevice);
printf("%s\n", cudaGetErrorString(cudaGetLastError()));
return 0;
}
As you can see, there's no problem at all. But, when I tried to extend COL_SIZE
to around 500,000
(exactly, 524288), it crashes with segmentation fault.
Any help as to the source of the problem?
cudaMemcpy2D
can only be used for copying pitched linear memory. Your source array is not pitched linear memory, it is an array of pointers. This is not supported and is the source of the segfault.
Try something like this:
float* buffer;
float** pfTest;
const size_t buffer_pitch = size_t(COL_SIZE) * sizeof(float);
buffer = (float*)malloc(size_t(ROW_SIZE) * buffer_pitch);
pfTest = (float**)malloc(ROW_SIZE * sizeof(float*));
for (size_t i = 0; i < ROW_SIZE; i++) {
pfTest[i] = buffer + i * size_t(COL_SIZE);
}
// ...
cudaMallocPitch(&dev_Test, &pitch, buffer_pitch, ROW_SIZE);
cudaMemcpy2D(dev_Test, pitch, buffer, buffer_pitch,
buffer_pitch, ROW_SIZE, cudaMemcpyHostToDevice);
[Note: written in browser, never tested or compiled, use at own risk]
ie store the data to be copied in a single contiguous memory allocation which can act as a pitched linear source for cudaMemcpy2D. If you insist on using [][]
style indexing on the host, then you have to pay the penalty of having an additional array of pointers to store alongside the data. Note that isn't actually necessary, and you could just directly index into buffer
and achieve the same result, while saving memory at the same time.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.