简体   繁体   English

CUDA内核性能随时间的变化

[英]Time dependent variation in CUDA kernel performance

I've been benchmarking some CUDA programs (2D and 3D lattice boltzmann solvers) and have come across something unusual; 我一直在对一些CUDA程序(2D和3D晶格boltzmann求解器)进行基准测试,遇到了一些不寻常的事情。 I would expect some random variation in the performance of the solver over time, but over a variety of different problem sizes, block sizes, OS' and GPUs (not to mention that the 2D and 3D codes are completely separate and not different configurations of the same program) I can see a very clear sinusoidal fluctuation in kernel execution times. 我希望随着时间的推移,求解器的性能会出现一些随机变化,但是会遇到各种不同的问题大小,块大小,OS'和GPU(更不用说2D和3D代码是完全独立的,并且配置也不相同)相同的程序),我可以看到内核执行时间非常明显的正弦波动。 For the two GPUs I've tested on (K5000m and K20c) the variation seems to have a frequency in the 10-12Hz range. 对于我测试过的两个GPU(K5000m和K20c),变化的频率似乎在10-12Hz范围内。

Is there any known explanation for this? 有什么已知的解释吗? My go-to idea is thermal/power management but I've not been able to prove it. 我的首选想法是热/电源管理,但我无法证明这一点。 Has anyone else experienced this? 其他人有没有经历过?

FURTHER INFO AND AN EXAMPLE 更多信息和示例

A MSVC2010 project for a small example code can be found at https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxtYXJram1hd3NvbnxneDplOWMwNWNhNDA4MmMwMjg The project requires CUDA 5.0 and a sm_30 device,although there is only one file so building the project manually would be trivial. 可以在https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxtYXJram1hd3NvbnxneDplOWMwNWNhNDA4MmMwMjg上找到一个用于小示例代码的MSVC2010项目,该项目仅需要CUDA 5.0,并且仅需一个CUDA 5.0,手动项目将是微不足道的。 The code is fairly self explanatory, 100 iterations of a simple kernel (kernels to read from several arrays and write to several arrays are performed by default) are timed and their results printed to a file. 该代码是很容易说明的,对一个简单内核的100次迭代(默认情况下执行从多个数组读取和写入多个数组的内核)进行计时,并将其结果打印到文件中。 Performing a FFT of the execution times yields visible peak near 11Hz on a K5000m. 对执行时间执行FFT会在K5000m上产生11Hz附近的可见峰值。 I would post an image but I don't have the reputation. 我会发布图片,但没有声誉。

Windows has behavioral effects on the detailed timing of GPU kernel execution, especially when running the GPU in WDDM mode. Windows对GPU内核执行的详细时间有行为影响,尤其是在WDDM模式下运行GPU时。 Please re-run your observation and FFT preferably in a linux environment where X is not also running on the GPU. 请最好在Linux上也未运行X的Linux环境中重新运行观察和FFT。 This will give you the most consistent behavior. 这将为您提供最一致的行为。 The CUDA driver in a WDDM setup is subject, to some degree, to the windows operating system. WDDM设置中的CUDA驱动程序在某种程度上受Windows操作系统的影响。

I ran your code on an SM35 device, CentOS 5.5, CUDA 5.5, and got the following Times.dat output: 我在SM35设备,CentOS 5.5,CUDA 5.5上运行了您的代码,并获得了以下Times.dat输出:

0.007648 0.0024 0.001888 0.001888 0.001888 0.001888 0.001888 0.00192 0.001888 0.001888 0.00192 0.00192 0.001856 0.00192 0.001888 0.00192 0.001888 0.001888 0.001888 0.001888 0.00192 0.001888 0.001888 0.001888 0.001856 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.002016 0.001888 0.001888 0.00192 0.001952 0.001888 0.001888 0.001888 0.001888 0.00192 0.00192 0.001888 0.00192 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.00192 0.001888 0.001888 0.001856 0.001888 0.00192 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.003904 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.00192 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.00192 0.001888 0.001888 0.001888 0.001888 0.00192 0.001888 0.00192 0.001888 0.001888 0.001888 0.001888 0.001888 0.00192 0.001888 0.00192 0.001856 0.001888 0.001856 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001952 0.0 0.007648 0.0024 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888敲除0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888系列0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.00192 0.001888 0.00192 0.001856 0.001888 0.001856 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001952 0.0 01888 0.001888 0.00192 0.00192 0.00192 0.001888 0.001888 0.001952 0.001888 0.00192 0.001888 0.001856 0.001888 0.00192 0.001888 0.001888 0.001888 0.00192 0.001856 0.001888 0.001888 0.001888 0.001888 0.00192 0.00192 0.001888 0.001888 0.00192 0.001888 0.001888 0.001888 0.001888 0.001888 0.00192 0.001888 0.00192 0.001888 0.001888 0.00192 0.001888 0.00192 0.001888 0.001888 0.001888 0.001888 0.001888 0.00192 0.004448 0.001888 0.001952 0.001888 0.001888 0.001888 0.001888 0.001888 0.001856 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.00192 0.001856 0.001888 0.001888 0.001888 0.001888 0.001856 0.001888 0.001888 0.001856 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 01888 0.001888 0.00192 0.00192 0.00192 0.001888 0.001888 0.001952 0.001888 0.00192 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.00001 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888计费0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888 0.001888

There is occasional variation however don't forget your cudaTime variable is capturing a time in milliseconds . 偶尔会有变化,但是请不要忘记您的cudaTime变量正在捕获以毫秒为单位的时间。 So the variation I see in the above data is mostly on the order of less than 1 microsecond variation, run to run. 因此,我在上面的数据中看到的变化主要是小于1微秒的变化。

Throwing out the first number, the largest variation I see is about 2-3 microseconds in a few cases. 剔除第一个数字,在某些情况下,我看到的最大变化约为2-3微秒。 Given that the execution time measured is less than 2 microseconds typically, this is a large variation but it's still in the noise, and not anything like the 10's of microseconds you're reporting. 考虑到所测量的执行时间通常少于2微秒,这是一个很大的变化,但仍然存在噪音,与您报告的10微秒无关。

To my untrained eye I also don't see any sinusoidal pattern in the data, but if you tell me there's an 11Hz frequency in there (not even sure what that means, since these data points are not time-stamped that I can see) -- I'll take your word for it. 肉眼我也看不到数据中的任何正弦波模式,但是如果您告诉我那里有一个11Hz的频率(甚至不确定那是什么意思,因为这些数据点没有我可以看到的时间戳) - 你的话我记住了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM