簡體   English   中英

探查器(nvvp 和 nvprof)不顯示“頁面錯誤”信息

[英]Profilers (nvvp and nvprof) not showing "Page Fault" information

我正在分析 NVIDIA 開發人員論壇上針對 CUDA 初學者的統一 Memory 中提供的測試代碼。

代碼:

#include <iostream>
#include <math.h>

// CUDA kernel to add elements of two arrays
__global__
void add(int n, float* x, float* y)
{
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;
    for (int i = index; i < n; i += stride)
        y[i] = x[i] + y[i];
}

int main(void)
{
    int N = 1 << 20;
    float* x, * y;

    // Allocate Unified Memory -- accessible from CPU or GPU
    cudaMallocManaged(&x, N * sizeof(float));
    cudaMallocManaged(&y, N * sizeof(float));

    // initialize x and y arrays on the host
    for (int i = 0; i < N; i++) {
        x[i] = 1.0f;
        y[i] = 2.0f;
    }

    // Launch kernel on 1M elements on the GPU
    int blockSize = 256;
    int numBlocks = (N + blockSize - 1) / blockSize;
    add << <numBlocks, blockSize >> > (N, x, y);

    // Wait for GPU to finish before accessing on host
    cudaDeviceSynchronize();

    // Check for errors (all values should be 3.0f)
    float maxError = 0.0f;
    for (int i = 0; i < N; i++)
        maxError = fmax(maxError, fabs(y[i] - 3.0f));
    std::cout << "Max error: " << maxError << std::endl;

    // Free memory
    cudaFree(x);
    cudaFree(y);

    return 0;
}

問題:作者提供的分析結果顯示了有關“頁面錯誤”的信息,但是當我運行nvprofnvvp分析器時,我沒有得到任何有關頁面錯誤的信息。 是否有任何標志或需要明確設置的東西才能獲取該信息?

我的 nvprof output:

== 20160 == Profiling result :
Type  Time(%)      Time     Calls       Avg       Min       Max  Name
GPU activities : 100.00 % 60.513us         1  60.513us  60.513us  60.513us  add(int, float*, float*)
API calls : 81.81 % 348.14ms         2  174.07ms  1.5933ms  346.54ms  cudaMallocManaged
16.10 % 68.511ms         1  68.511ms  68.511ms  68.511ms  cuDevicePrimaryCtxRelease
1.34 % 5.7002ms         1  5.7002ms  5.7002ms  5.7002ms  cudaLaunchKernel
0.66 % 2.8192ms         2  1.4096ms  1.0669ms  1.7523ms  cudaFree
0.07 % 277.80us         1  277.80us  277.80us  277.80us  cudaDeviceSynchronize
0.01 % 33.500us         3  11.166us  3.5000us  16.400us  cuModuleUnload
0.00 % 19.800us         1  19.800us  19.800us  19.800us  cuDeviceTotalMem
0.00 % 16.700us       101     165ns     100ns     900ns  cuDeviceGetAttribute
0.00 % 9.2000us         3  3.0660us     200ns  8.2000us  cuDeviceGetCount
0.00 % 3.1000us         1  3.1000us  3.1000us  3.1000us  cuDeviceGetName
0.00 % 2.1000us         2  1.0500us     300ns  1.8000us  cuDeviceGet
0.00 % 300ns         1     300ns     300ns     300ns  cuDeviceGetLuid
0.00 % 200ns         1     200ns     200ns     200ns  cuDeviceGetUuid

== 20160 == Unified Memory profiling result :
Device "GeForce GTX 1070 (0)"
Count  Avg Size  Min Size  Max Size  Total Size  Total Time  Name
64  128.00KB  128.00KB  128.00KB  8.000000MB  3.217900ms  Host To Device
146  84.164KB  32.000KB  1.0000MB  12.00000MB  68.17800ms  Device To Host

我的 nvvp 分析結果:

在此處輸入圖像描述

操作系統很重要。

You are on windows, and the CUDA Unified Memory (UM) system works quite a bit differently on windows as compared to linux, when pascal or newer devices are in view.

在 windows 上,頁面錯誤不是 UM 系統用來確定何時遷移數據的機制,因此不會在探查器中或由探查器報告它們。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM