cuda, OpenGL interoperability: cudaErrorMemoryAllocation error on cudaGraphicsGLRegisterBuffer

Question

I am having random cuda memory allocation error on use of cudaGraphicsGLRegisterBuffer() . I have a fairly large OpenGL PBO object which is shared with it and CUDA. The PBO object is created as follows:

GLuint          buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, buffer);
glBufferData(target, rows * cols * 4, NULL, GL_DYNAMIC_COPY);
glUnmapBuffer(_target);
glBindBuffer(_target, 0);

The object is quite large. width and height are 5000. However, it allocates fine on my GPU. Now, I am sharing this between OpenGL and CUDA as follows. I have a simple class to manage the it as follows:

class CudaPBOGraphicsResource
{
public:
    CudaPBOGraphicsResource(GLuint pbo_id);
    ~CudaPBOGraphicsResource();
     inline cudaGraphicsResource_t resource() const { return _cgr; }
private:
    cudaGraphicsResource_t          _cgr;
};

CudaPBOGraphicsResource::CudaPBOGraphicsResource(GLuint pbo_id)
{
    checkCudaErrors(cudaGraphicsGLRegisterBuffer(&_cgr, pbo_id,
                    cudaGraphicsRegisterFlagsNone));
    checkCudaErrors(cudaGraphicsMapResources(1, &_cgr, 0));
}

CudaPBOGraphicsResource::~CudaPBOGraphicsResource()
{
    if (_cgr) {
        checkCudaErrors(cudaGraphicsUnmapResources(1, &_cgr, 0));
    }
}

Now I do the OpenGL and CUDA interoperability as follows:

{
    CudaPBOGraphicsResource input_cpgr(pbo_id);
    uchar4 * input_ptr = 0;
    size_t num_bytes;
    checkCudaErrors(cudaGraphicsResourceGetMappedPointer((void 
                    **)&input_ptr, &num_bytes,
                    input_cpgr.resource()));

    call_my_kernel(input_ptr);
}

This runs for my inputs for a while but after sometime it crashes with:

CUDA error code=2(cudaErrorMemoryAllocation) 
                 "cudaGraphicsGLRegisterBuffer(&_cgr, pbo_id, 
                  cudaGraphicsRegisterFlagsNone)" 
Segmentation fault

I am not sure why there is memory allocation going on as I thought this was shared. I added cudaDeviceSynchronize() after the kernel call but the error still persists. My call_my_kernel() function is now pretty much doing nothing, so there are no other CUDA calls that can raise this error!

I am using Cuda 7 on linux with a K4000 Quadro card.

EDIT I updated the driver to the latest 346.72 version and the error still happens. It also does not depend on the kernel call. Just calling cudaGraphicsGLRegisterBuffer() seems to leak memory on the GPU. Running nvidia-smi as the program is running shows the memory going up steadily. I am still at a loss as to why there is any copying happening...

Answer 1

Ok, I found the answer to my conundrum and I hope it will help anyone else using CUDA-OGL together.

The problem was that I was calling:

checkCudaErrors(cudaGraphicsGLRegisterBuffer(&_cgr, pbo_id,
                cudaGraphicsRegisterFlagsNone));

everytime. This actually needs to be called only once and then I just need to call map/unmap on the _cgr object.

cuda, OpenGL interoperability: cudaErrorMemoryAllocation error on cudaGraphicsGLRegisterBuffer

Question

1 answers

solution1
5 ACCPTED 2015-05-21 15:13:42

cuda, OpenGL interoperability: cudaErrorMemoryAllocation error on cudaGraphicsGLRegisterBuffer

Question

1 answers

solution1 5 ACCPTED 2015-05-21 15:13:42

solution1
5 ACCPTED 2015-05-21 15:13:42