Theano：使用CPU与GPU的矩阵点时的差异与Numpy的比较

Question

I recently got Theano working on Windows 10 with CUDA v7.5, CUDNN v3, and Visual Studio 2013 Community Edition. 我最近让Theano使用CUDA v7.5，CUDNN v3和Visual Studio 2013 Community Edition在Windows 10上工作。 In order to verify it was working correctly, I tested the following code from the Theano Windows install page using both CPU and GPU: 为了验证它是否正常工作，我使用CPU和GPU测试了Theano Windows安装页面中的以下代码：

import numpy as np
import time
import theano
A = np.random.rand(10000,10000).astype(theano.config.floatX)
B = np.random.rand(10000,10000).astype(theano.config.floatX)
np_start = time.time()
AB = A.dot(B)
np_end = time.time()
X,Y = theano.tensor.matrices('XY')
mf = theano.function([X,Y],X.dot(Y))
t_start = time.time()
tAB = mf(A,B)
t_end = time.time()
print "NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
                                           np_end-np_start, t_end-t_start)
print "Result difference: %f" % (np.abs(AB-tAB).max(), )

I got the following results: 我得到了以下结果：

G:\ml\Theano\Projects>python Test.py
NP time: 10.585000[s], theano time: 10.587000[s] (times should be close when run on CPU!)
Result difference: 0.000000

G:\ml\Theano\Projects>python Test.py
Using gpu device 0: GeForce GTX 970 (CNMeM is disabled)
NP time: 10.838000[s], theano time: 1.294000[s] (times should be close when run on CPU!)
Result difference: 0.022461

As you can see, there is a fairly significant difference of 0.022 when doing the calculation on GPU. 如您所见，在GPU上进行计算时，存在相当显着的差异0.022。 Just wondering whether this is to be expected or I am doing something wrong. 只是想知道这是否是预期的，或者我做错了什么。

Here is my .theanorc: 这是我的.theanorc：

[global]
device = gpu
floatX = float32

[nvcc]
fastmath = True

Answer 1

The GPU doesn't do the addition and multiplication in the same order. GPU不以相同的顺序进行加法和乘法。 As floats are not exact, it is normal to see some differences. 由于浮标不准确，看到一些差异是正常的。

An absolute difference of that size can be normal if the relative difference is small. 如果相对差异很小，则该大小的绝对差异可以是正常的。

To compare them more "correctly" use theano.tensor.basic._allclose(result1, result2) 要更“正确”地比较它们，请使用theano.tensor.basic._allclose(result1, result2)

Theano：使用CPU与GPU的矩阵点时的差异与Numpy的比较

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-11-30 20:35:05

Theano：使用CPU与GPU的矩阵点时的差异与Numpy的比较

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-11-30 20:35:05

解决方案1
4 已采纳 2015-11-30 20:35:05