为什么我的GPU代码运行得比CPU慢得多

Question

This is one of the standard example code we find every where... 这是我们在任何地方都能找到的标准示例代码之一...

import time
import numpy

import pycuda.gpuarray as gpuarray
import pycuda.cumath as cumath
import pycuda.autoinit

size = 1e7

t0 = time.time()
x = numpy.linspace(1, size, size).astype(numpy.float32)
y = numpy.sin(x)
t1 = time.time()

cpuTime = t1-t0
print(cpuTime)

t0 = time.time()
x_gpu = gpuarray.to_gpu(x)
y_gpu = cumath.sin(x_gpu)
y = y_gpu.get()
t1 = time.time()

gpuTime = t1-t0
print(gpuTime)

the results are: 200 msec for cpu and 2.45 sec for GPU... more then 10X 结果是：cpu为200毫秒，GPU为2.45秒钟...然后是10倍

I'm running on win 10... vs 2015 with PTVS... 我正在赢10 ...与PTVS对比2015 ...

Best regards... 最好的祝福...

Steph 斯蒂芬

Answer 1

It looks like pycuda introduces some additional overhead the first time you call the cumath.sin() function (~400ms on my system). 似乎pycuda首次调用cumath.sin()函数时会pycuda一些额外的开销（在我的系统上约为400ms）。 I suspect this is due to the need to compile CUDA code for the function being called. 我怀疑这是由于需要为要调用的函数编译CUDA代码。 More importantly, this overhead is independent of the size of the array being passed to the function. 更重要的是，这种开销与传递给函数的数组大小无关。 Additional calls to cumath.sin() are much faster, with CUDA code already compiled for use. 使用CUDA代码已进行编译，对cumath.sin()其他调用会更快。 On my system, the gpu code given in the question runs in about 20ms (for repeated runs), compared to roughly 130ms for the numpy code. 在我的系统上，问题中给出的gpu代码运行大约20毫秒（对于重复运行），而numpy代码大约运行130毫秒。

I don't profess to know much at all about the inner workings of pycuda , so would be interested to hear other people's opinions on this. 我并不完全了解pycuda的内部运作方式，因此很想听听其他人对此的看法。

为什么我的GPU代码运行得比CPU慢得多

问题描述

1 个解决方案

解决方案1
2 2016-06-22 04:26:47

为什么我的GPU代码运行得比CPU慢得多

问题描述

1 个解决方案

解决方案1 2 2016-06-22 04:26:47

解决方案1
2 2016-06-22 04:26:47