简体繁体中英

Size of statically allocated shared memory per block with Compute Prof (Cuda/OpenCL)

原文 2010-10-13 20:39:11 5 1 cuda/ profiling/ opencl/ nvidia

In Nvidia's compute prof there is a column called "static private mem per work group" and the tooltip of it says "Size of statically allocated shared memory per block". My application shows that I am getting 64 (bytes I assume) per block. Does that mean I am using somewhere between 1-64 of those bytes or is the profiler just telling me that this amount of shared memory was allocated and who knows if it was used at all?

1 answers

If it's allocated, it's probably because you used it. AFAIK CUDA passes parameters to kernels via shared memory, so it's must be that.

Compute Prof's fields for incoherent and coherent gst/gld? (CUDA/OpenCL)

Statically allocated global memory struct in CUDA

CUDA shared memory cannot be allocated

Is it possible to compare more than two kernels executions at a time in Compute Prof (OpenCL/CUDA)

Using both dynamically-allocated and statically-allocated shared memory

Maximum (shared memory per block) / (threads per block) in CUDA with 100% MP load

Maximum threads per block vs shared memory size

Get allocated memory size for CUDA buffer

Which is the correct way of allocating shared memory statically in cuda and why?

CUDA: Tiled matrix-matrix multiplication with shared memory and matrix size which is non-multiple of the block size

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Compute Prof's fields for incoherent and coherent gst/gld? (CUDA/OpenCL) Statically allocated global memory struct in CUDA CUDA shared memory cannot be allocated Is it possible to compare more than two kernels executions at a time in Compute Prof (OpenCL/CUDA) Using both dynamically-allocated and statically-allocated shared memory Maximum (shared memory per block) / (threads per block) in CUDA with 100% MP load Maximum threads per block vs shared memory size Get allocated memory size for CUDA buffer Which is the correct way of allocating shared memory statically in cuda and why? CUDA: Tiled matrix-matrix multiplication with shared memory and matrix size which is non-multiple of the block size

Related Tags

Size of statically allocated shared memory per block with Compute Prof (Cuda/OpenCL)

Question

1 answers

solution1 1 ACCPTED 2010-10-14 17:22:35

solution1
1 ACCPTED 2010-10-14 17:22:35