CUDA体系结构-sm_11在NSight中编译问题

Question

As my GPU device Quadro FX 3700 doesn't support arch>sm_11. 由于我的GPU设备Quadro FX 3700不支持arch> sm_11。 I was not able to use relocatable device code ( rdc ). 我无法使用可重定位的设备代码（ rdc ）。 Hence i combined all the utilities needed into 1 large file ( say x.cu ). 因此，我将所需的所有实用程序合并为1个大文件（ 例如x.cu ）。 To give a overview of x.cu it contains 2 classes with 5 member functions each, 20 device functions, 1 global kernel, 1 kernel caller function. 为了大致了解x.cu，它包含2个类，每个类具有5个成员函数，20个设备函数，1个全局内核，1个内核调用程序函数。

Now, when i try to compile via Nsight it just hangs showing Build% as 3 . 现在，当我尝试通过Nsight进行编译时，它只是挂起，显示Build％为3 。 When i try compiling using 当我尝试使用编译

nvcc x.cu -o output -I"."

It shows the following Messages and compiles after a long time, 它显示以下消息并经过很长时间后进行编译，

/tmp/tmpxft_0000236a_00000000-9_Kernel.cpp3.i(0): Warning: Olimit was exceeded on function _Z18optimalOrderKernelPdP18PrepositioningCUDAdi; will not perform function-scope optimization.
    To still perform function-scope optimization, use -OPT:Olimit=0 (no limit) or -OPT:Olimit=45022
/tmp/tmpxft_0000236a_00000000-9_Kernel.cpp3.i(0): Warning: To override Olimit for all functions in file, use -OPT:Olimit=45022
    (Compiler may run out of memory or run very slowly for large Olimit values)

Where optimalOrderKernel is the global kernel. 其中OptimalOrderKernel是全局内核。 As compiling shouldn't be taking much time. 由于编译不应该花费太多时间。 I want to understand the reason behind this messages, particularly Olimit. 我想了解此消息背后的原因，尤其是Olimit。

Answer 1

Olimit is pretty clear, I think. 我认为Olimit非常清楚。 It is a limit the compiler places on the amount of effort it will expend on optimizing code. 这是编译器对优化代码所花费的精力的限制。

Most codes compile just fine using nvcc . 大多数代码可以使用nvcc编译。 However, no compiler is perfect, and some seemingly innocuous codes can cause the compiler to spend a long time at an optimization process that would normally be quick. 但是，没有一个编译器是完美的，并且一些看似无害的代码可能会导致编译器在通常会很快的优化过程上花费很长时间。

Since you haven't provided any code, I'm speaking in generalities. 由于您还没有提供任何代码，因此我是在概括地说。

Since there is the occasional case where the compiler spends a disproportionately long time in certain optimization phases, the Olimit provides a convenient watchdog, so you have some idea of why it is taking so long. 由于在某些优化阶段偶尔会出现编译器花费不成比例的长时间的情况，因此Olimit提供了一种方便的看门狗，因此您对为什么要花这么长时间有所了解。 Furthermore, the Olimit acts like a watchdog on an optimization process that is taking too long. 此外， Olimit在花费太长时间的优化过程中就像看门狗一样。 When it is exceeded, certain optimization steps are aborted, and a "less optimized" version of your code is generated, instead. 超过此数量时，某些优化步骤将中止，并且将生成代码的“优化程度较低”版本。

I think the compiler messages you received are quite clear on how to modify the Olimit depending on your intentions. 我认为您收到的编译器消息非常清楚如何根据您的意图修改Olimit 。 You can override it to increase the watchdog period, or disable it entirely (by setting it to zero). 您可以覆盖它以增加看门狗时间，也可以完全禁用它（将其设置为零）。 In that case, the compile process could take an arbitrarily long period of time, and/or run out of memory, as the messages indicate. 在这种情况下，如消息所示，编译过程可能会花费任意长时间，和/或会耗尽内存。

CUDA体系结构-sm_11在NSight中编译问题

问题描述

1 个解决方案

解决方案1
1 2014-01-13 19:56:26

CUDA体系结构-sm_11在NSight中编译问题

问题描述

1 个解决方案

解决方案1 1 2014-01-13 19:56:26

解决方案1
1 2014-01-13 19:56:26