GPU Tesla M60 Driver: 510.47.03 OSL Ubuntu 20.04.5 LTS CUDA Version: 11.6 Trying the code below to get back full metrics on profiling a CUDA applicat ...
GPU Tesla M60 Driver: 510.47.03 OSL Ubuntu 20.04.5 LTS CUDA Version: 11.6 Trying the code below to get back full metrics on profiling a CUDA applicat ...
I am profiling a test code presented in the Unified Memory for CUDA Beginners on NVIDIA's developer forum. Code: QUESTION: The results of the prof ...
I get the message in the subject when I try to run a program I developed with OpenACC through Nvidia's nvprof profiler like this: If I run nvprof w ...
According to the profiler user guide: flop_count_sp: Number of single-precision floating-point operations executed by non-predicated threads (add, ...
I am trying to get some insight of why my CUDA kernel has a relatively low performance and I am hoping to get some answers with the NVIDIA profiler. ...
When I list nvprof's metrics with nvprof --query-events I see: thread_inst_executed: Number of instructions executed by the active threads. For ...
I used nvprof to profile a simple vecadd example (n=1024) on P100 but observed the dram_write_bytes is only 256 (rather than 1024*4 that I expected). ...
When I try to run nvprof command in Command Prompt, System Erros pops up and says "The code execution cannot proceed because cupti64_102.dll was not f ...
Running nvprof --metrics command under windows gives an error: error1 If I only use the nvprof command, no error will be reported: I would like ...
I would like to profile the training loop of a transformer model written in Tensorflow on a multi-GPU system. Since the code doesn't support tf2, I ca ...
I am profiling a tensorflow GPU application with NVIDIA's command line Visual Profiler nvprof, and one of the kernels that was launched and is very ac ...
I'm running nvprof to profile GPU usage of a TensorRT server-client model. Here's what I'm doing: Run nvprof on terminal 1 within a docker contain ...
For a perfectly coalesced accesses to an array of 4096 doubles, each 8 bytes, nvprof reports the following metrics on a Nvidia Tesla V100: I cannot ...
When I use os.environ['CUDA_VISIBLE_DEVICES'] in pytorch, I get the following message What does this actually mean? How can I avoid this by using ' ...
I run nvprof.exe on the function that initialize data, calls three kernels and free's data. All profiled as it should and I got result like this: A ...
Do the SM's shown in the "occupancy graph" correspond to blockIdx.x or register %smid? Here's an example of such a graph And here's some sample ou ...
There are two nvprof metrics regarding load/store instructions and they are ldst_executed and ldst_issued. We know that executed<=issued. I expect ...
I am trying to profile my CUDA program, using the nvprof tool. Here is my code: I compiled it using the command nvcc add.cu -o add_cuda. I then ...
I'm trying to pre-fetch some data. Usually I rely on the compiler to do this, as compilers have many many thousands of people working on them, and I a ...
Is there a way to get CUDA's nvprof to include function calls like malloc in its statistical profiler? I've been trying to improve the performance of ...