简体繁体 English

Visual Studio 2013中的C ++和Cuda速度

[英]C++ and Cuda speed in Visual studio 2013

原文 2016-09-17 14:40:33 8 1 c++/ visual-studio-2013/ cuda

I'm running Data mining algorithm on VS 2013. I've implement CPU based version (with .cpp file) and GPU based version (with cuda 7.5 .cu file). 我在VS 2013上运行数据挖掘算法。已经实现了基于CPU的版本（带有.cpp文件）和基于GPU的版本（带有cuda 7.5 .cu文件）。

Both version run as expected. 两个版本均按预期运行。 CPU based version takes about 1500 seconds and GPU version 500 seconds. 基于CPU的版本大约需要1500秒，而GPU版本需要500秒。

I then combine both file into single .cu file, and control which version to run with a flag, and I found the CPU version become faster in .cu file with all other parameters and code stayed the same, it only takes about 600 seconds. 然后，我将这两个文件组合成单个.cu文件，并用一个标志控制运行哪个版本，然后我发现.cu文件中的CPU版本变得更快，而所有其他参数和代码保持不变，只花了大约600秒。

Then I tried to run same pieces of c++ code (without cuda) in Empty C++ project and CUDA project seperately and found the result consistent. 然后，我尝试分别在Empty C ++项目和CUDA项目中运行相同的C ++代码段（不带CUDA），并发现结果一致。 The cu version takes 600 seconds while cpp one takes 1500 seconds. cu版本需要600秒，而cpp版本需要1500秒。

Why would this happen? 为什么会这样？ Is this come from different compiler or different initial environment of VS project? 这是来自不同的编译器还是VS项目的不同初始环境？

1 个解决方案

Host code that nvcc passes to the host compiler is usually not a verbatim copy of the host portion of the .cu file as written by the programmer. nvcc传递给主机编译器的主机代码通常不是程序员编写的.cu文件的主机部分的.cu副本。 Instead, nvcc parses and pre-processes the code and sends semantically identical code to the host compiler (a look at the intermediate files generated as part of the nvcc compilation trajectory will reveal the details). 相反， nvcc会对代码进行解析和预处理，并将语义上相同的代码发送给主机编译器（查看作为nvcc编译轨迹一部分而生成的中间文件将揭示细节）。 Due to artifacts in the host compiler's code generation, this could result in host code that runs faster or slower when incorporated into .cu file compared to the stand-alone version in a .cpp file. 由于主机编译器代码生成中的伪像，与.cpp文件中的独立版本相比，当将其合并到.cu文件中时，这可能导致主机代码运行得更快或更慢。

Usually, the resulting performance differences are quite small, up to about 10% in my experience. 通常，由此产生的性能差异很小，以我的经验而言，最高可达10％。 So the very significant different performance difference reported here is either an extreme outlier of the scenario outlined above, or (more likely, in my opionion) there are other differences in the compilation. 因此，此处报告的非常明显的性能差异是上述情况的极端异常，或者（在我看来，更可能是）编译中存在其他差异。

For example, different compiler options, eg different optimization levels, could have been passed to the host compiler as part of the CUDA compilation vs stand-alone compilation. 例如，可以将不同的编译器选项（例如，不同的优化级别）作为CUDA编译与独立编译的一部分传递给主机编译器。 If you enable a verbose log of the compilation process in MSVS that shows the details of host compiler invocation, it should become apparent whether that is the case. 如果您在MSVS中启用了详细的编译过程日志，以显示主机编译器调用的详细信息，那么事实是否如此应该很明显。