简体   繁体   English

cuda浮点精度

[英]cuda float point precision

Can someone comment on this, 有人可以对此发表评论吗?

I want to do a vector dot product. 我想做一个矢量点积。 My float vector are [2080:2131] and [2112:2163], each one of them contains 52 elements. 我的浮点向量是[2080:2131]和[2112:2163],它们每个都包含52个元素。

a[52] = {2080 2081 2082 ... ... 2129 2130 2131};
b[52] = {2112 2113 2114 ... ... 2161 2162 2163};

for (int i = 0; i < 52; i++)
{
    sum += a[i]*b[i];
}

The result sum for whole length (52 element)was 234038032 by my kernel while matlab gave 234038038. For 1 to 9 element sum of product, my kernel result agrees with matlab result. 我的内核对整个长度(52个元素)的结果总和为234038032,而matlab给出了234038038。对于1到9个元素乘积的总和,我的内核结果与matlab结果一致。 For 10 element sum, it is off by 1 and gradually increases. 对于10个元素的总和,它减1,然后逐渐增加。 The results were reproducible. 结果是可重复的。 I checked all the elements and found no problem. 我检查了所有元素,发现没有问题。

Since the vectors are float you are experiencing rounding errors. 由于向量是浮动的,因此您遇到舍入错误。 Matlab will store everything with much higher precision (double) and hence won't see the rounding errors so early. Matlab将以更高的精度(双精度)存储所有内容,因此不会这么早看到舍入错误。

You may want to check out What Every Computer Scientist Should Know About Floating Point by David Goldberg - invaluable reading. 您可能想看看David Goldberg的《每位计算机科学家应该知道的有关浮点的知识》 -宝贵的读物。

Simple demo in C++ (ie nothing to do with CUDA): C ++中的简单演示(即与CUDA无关):

#include <iostream>

int main(void)
{
  float a[52];
  float b[52];
  double c[52];
  double d[52];

  for (int i = 0 ; i < 52 ; i++)
  {
    a[i] = (float)(2080 + i);
    b[i] = (float)(2112 + i);
    c[i] = (double)(2080 + i);
    d[i] = (double)(2112 + i);
  }

  float fsum = 0.0f;
  double dsum = 0.0;
  for (int i = 0 ; i < 52 ; i++)
  {
    fsum += a[i]*b[i];
    dsum += c[i]*d[i];
  }

  std::cout.precision(20);
  std::cout << fsum << " " << dsum << std::endl;
}

Run this and you get: 运行此命令,您将获得:

234038032 234038038

So what can you do about this? 那你该怎么办呢? There are several directions you could go in... 您可以选择几个方向...

  • Use higher precision: this will affect performance and not all devices support double precision. 使用更高的精度:这会影响性能,并且并非所有设备都支持双精度。 It also just postpones the problem rather than fixing it, so I would not recommend it! 它还只是推迟而不是解决问题,所以我不建议这样做!
  • Do a tree based reduction: you could combin the techniques in the vectorAdd and reduction SDK samples. 进行基于树的归约:您可以结合vectorAdd和归约SDK样本中的技术。
  • Use Thrust : very straight-forward. 使用推力 :非常简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM