为什么我的GPU代码运行得比CPU慢得多

Question

这是我们在任何地方都能找到的标准示例代码之一...

import time
import numpy

import pycuda.gpuarray as gpuarray
import pycuda.cumath as cumath
import pycuda.autoinit

size = 1e7

t0 = time.time()
x = numpy.linspace(1, size, size).astype(numpy.float32)
y = numpy.sin(x)
t1 = time.time()

cpuTime = t1-t0
print(cpuTime)

t0 = time.time()
x_gpu = gpuarray.to_gpu(x)
y_gpu = cumath.sin(x_gpu)
y = y_gpu.get()
t1 = time.time()

gpuTime = t1-t0
print(gpuTime)

结果是：cpu为200毫秒，GPU为2.45秒钟...然后是10倍

我正在赢10 ...与PTVS对比2015 ...

最好的祝福...

斯蒂芬

Answer 1

似乎pycuda首次调用cumath.sin()函数时会pycuda一些额外的开销（在我的系统上约为400ms）。 我怀疑这是由于需要为要调用的函数编译CUDA代码。 更重要的是，这种开销与传递给函数的数组大小无关。 使用CUDA代码已进行编译，对cumath.sin()其他调用会更快。 在我的系统上，问题中给出的gpu代码运行大约20毫秒（对于重复运行），而numpy代码大约运行130毫秒。

我并不完全了解pycuda的内部运作方式，因此很想听听其他人对此的看法。

为什么我的GPU代码运行得比CPU慢得多

问题描述

1 个解决方案

解决方案1
2 2016-06-22 04:26:47

为什么我的GPU代码运行得比CPU慢得多

问题描述

1 个解决方案

解决方案1 2 2016-06-22 04:26:47

解决方案1
2 2016-06-22 04:26:47