简体   繁体   中英

What makes cuLaunchKernel fail with CUDA_ERROR_INVALID_HANDLE?

I'm launching a CUDA kernel I've compiled, using the cudLaunchKernel() driver API function. I'm passing my parameters in a kernelParams array, and passing nullptr for the extra argument.

Unfortunately, this fails, with the error: CUDA_ERROR_INVALID_HANDLE . Why? I checked the Driver API documentation to see how the function might fail in what cases, and edit it discusses the failure with CUDA_ERROR_INVALID_VALUE (not the same thing). It doesn't discuss the error I get.

Since there is more than one parameter to cuLaunchKernel() which is some sort of a handle - what does this failure mean? (And if there are multiple options - what are they?)

One possibility is a failure due to a CUDA driver context switch. You may have probably inadvertently performed some action which pushes or replaces the current context for the CUDA device; and loaded modules are part of context - so your compiled and loaded kernel can no longer be loaded in the current context. This triggers a CUDA_ERROR_INVALID_HANDLE failure.

Assuming this is the case, switch the context before the launch, eg this way:

cuCtxPushCurrent(my_driver_context);
cuLaunchKernel(/*etc. etc. */);
/* possibly */ cuCtxPopCurrent(NULL);

or like so:

cuCtxSetCurrent(my_driver_context);
cuLaunchKernel(/*etc. etc. */);

Note that you may be risking memory leaks, if you pop and ignore the only reference to a valid context; and you may also risk some other code assuming that the context it has put in place is still the active one.

Well, in my case it was an OOM error (Out of Memory) error which for some reason was not reported as such. When I reduced the batch size of my model it worked. Maybe you should check if this is the case also.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM