Tag[nvprof] Recent Newest Questions

ERR_NVGPUCTRPERM error when launching nvprof with all metrics to profile CUDA application

GPU Tesla M60 Driver: 510.47.03 OSL Ubuntu 20.04.5 LTS CUDA Version: 11.6 Trying the code below to get back full metrics on profiling a CUDA applicat ...

Profilers (nvvp and nvprof) not showing "Page Fault" information

I am profiling a test code presented in the Unified Memory for CUDA Beginners on NVIDIA's developer forum. Code: QUESTION: The results of the prof ...

nvprof Warning: The path to CUPTI and CUDA Injection libraries might not be set in LD_LIBRARY_PATH

I get the message in the subject when I try to run a program I developed with OpenACC through Nvidia's nvprof profiler like this: If I run nvprof w ...

Meaning of the “flop_count_sp” and “inst_fp_32” metric in CUDA Profiler

According to the profiler user guide: flop_count_sp: Number of single-precision floating-point operations executed by non-predicated threads (add, ...

NVIDIA Visual Profiler: Insufficient kernel bounds data

I am trying to get some insight of why my CUDA kernel has a relatively low performance and I am hoping to get some answers with the NVIDIA profiler. ...

Why don't I get “thread_inst_executed”

When I list nvprof's metrics with nvprof --query-events I see: thread_inst_executed: Number of instructions executed by the active threads. For ...

dram_write_bytes result on P100

I used nvprof to profile a simple vecadd example (n=1024) on P100 but observed the dram_write_bytes is only 256 (rather than 1024*4 that I expected). ...

nvprof command error: cupti64_102.dll was not found

When I try to run nvprof command in Command Prompt, System Erros pops up and says "The code execution cannot proceed because cupti64_102.dll was not f ...

Running nvprof --metrics command under windows gives an error：cuda profiling error

Running nvprof --metrics command under windows gives an error： error1 If I only use the nvprof command, no error will be reported： I would like ...

How to capture GPU data when profiling Tensorflow code with nvprof?

I would like to profile the training loop of a transformer model written in Tensorflow on a multi-GPU system. Since the code doesn't support tf2, I ca ...

What is redzone_checker? Profiling my tensorflow application on a GPU

I am profiling a tensorflow GPU application with NVIDIA's command line Visual Profiler nvprof, and one of the kernels that was launched and is very ac ...

How to stop running TensorRT server without using ctrl-c (for profiling with nvprof)

I'm running nvprof to profile GPU usage of a TensorRT server-client model. Here's what I'm doing: Run nvprof on terminal 1 within a docker contain ...

What is a transaction and a request in the 'gld_transactions_per_request' metric of the Cuda profiler?

For a perfectly coalesced accesses to an array of 4096 doubles, each 8 bytes, nvprof reports the following metrics on a Nvidia Tesla V100: I cannot ...

nvprof warning on CUDA_VISIBLE_DEVICES

When I use os.environ['CUDA_VISIBLE_DEVICES'] in pytorch, I get the following message What does this actually mean? How can I avoid this by using ' ...

No GPU activities in profiling with nvprof

I run nvprof.exe on the function that initialize data, calls three kernels and free's data. All profiled as it should and I got result like this: A ...

Do the SM's shown in the “occupancy graph” correspond to `blockIdx.x` or register `%smid`?

Do the SM's shown in the "occupancy graph" correspond to blockIdx.x or register %smid? Here's an example of such a graph And here's some sample ou ...

Issued load/store instructions for replay

There are two nvprof metrics regarding load/store instructions and they are ldst_executed and ldst_issued. We know that executed<=issued. I expect ...

nvprof - profiling data are not recorded

I am trying to profile my CUDA program, using the nvprof tool. Here is my code: I compiled it using the command nvcc add.cu -o add_cuda. I then ...

How to get algorithmic prefetching to work in CUDA

I'm trying to pre-fetch some data. Usually I rely on the compiler to do this, as compilers have many many thousands of people working on them, and I a ...

How to get malloc to show up in nvprof's statistical profiler?

Is there a way to get CUDA's nvprof to include function calls like malloc in its statistical profiler? I've been trying to improve the performance of ...