简体   繁体   English

从CAMPARY输出值

[英]Outputting values from CAMPARY

I'm trying to use the CAMPARY library (CudA Multiple Precision ARithmetic librarY). 我正在尝试使用CAMPARY库(CudA Multiple Precision Arithmetic librarY)。 I've downloaded the code and included it in my project. 我已经下载了代码并将其包含在我的项目中。 Since it supports both cpu and gpu, I'm starting with cpu to understand how it works and make sure it does what I need. 由于它同时支持cpu和gpu,因此我将从cpu开始以了解其工作原理并确保它能够满足我的需求。 But the intent is to use this with CUDA. 但目的是将其与CUDA一起使用。

I'm able to instantiate an instance and assign a value, but I can't figure out how to get things back out. 我能够实例化一个实例并分配一个值,但是我不知道如何把事情弄回来。 Consider: 考虑:

#include <time.h>
#include "c:\\vss\\CAMPARY\\Doubles\\src_cpu\\multi_prec.h"

int main()
{
    const char *value = "123456789012345678901234567";

    multi_prec<2> a(value);

    a.prettyPrint();
    a.prettyPrintBin();
    a.prettyPrintBin_UnevalSum();
    char *cc = a.prettyPrintBF();
    printf("\n%s\n", cc);
    free(cc);
}

Compiles, links, runs (VS 2017). 编译,链接,运行(VS 2017)。 But the output is pretty unhelpful: 但是输出非常无用:

Prec = 2
   Data[0] = 1.234568e+26
   Data[1] = 7.486371e+08

Prec = 2
   Data[0] = 0x1.987bf7c563caap+86;
   Data[1] = 0x1.64fa5c3800000p+29;

0x1.987bf7c563caap+86 + 0x1.64fa5c3800000p+29;

1.234568e+26 7.486371e+08

Printing each of the doubles like this might be easy to do, but it doesn't tell you much about the value of the 128 number being stored. 像这样打印每个双打可能很容易,但是并不能告诉您有关所存储的128数字的值。 Performing highly accurate computations is of limited value if there's no way to output the results. 如果无法输出结果,则执行高精度计算的价值将有限。

In addition to just printing out the value, eventually I also need to convert these numbers to ints (I'm willing to try it all in floats if there's a way to print, but I fear that both accuracy and speed will suffer). 除了仅打印出值外,最终我还需要将这些数字转换为int(如果有打印方法,我愿意尝试使用浮点数,但我担心准确性和速度都会受到影响)。 Unlike MPIR (which doesn't support CUDA), CAMPARY doesn't have any associated multi-precision int type, just floats. MPIR (不支持CUDA)不同,CAMPARY没有任何关联的多精度int类型,只是浮点型。 I can probably cobble together what I need (mostly just add/subtract/compare), but only if I can get the integer portion of CAMPARY's values back out, which I don't see a way to do. 我可能可以拼凑所需的内容(主要是加/减/比较),但前提是我可以取回CAMPARY值的整数部分,但我看不出有办法。

CAMPARY doesn't seem to have any docs, so it's conceivable these capabilities are there, and I've simply overlooked them. CAMPARY似乎没有任何文档,因此可以想象这些功能在那里,而我只是忽略了它们。 And I'd rather ask on the CAMPARY discussion forum/mail list, but there doesn't seem to be one. 我想在CAMPARY论坛/邮件列表中提问,但似乎没有。 That's why I'm asking here. 这就是为什么我在这里问。

To sum up: 总结一下:

  1. Is there any way to output the 128bit ( multi_prec<2> ) values from CAMPARY? 有什么办法可以从CAMPARY中输出128bit( multi_prec<2> )值吗?
  2. Is there any way to extract the integer portion from a CAMPARY multi_prec? 有什么方法可以从CAMPARY multi_prec中提取整数部分吗? Perhaps one of the (many) math functions in the library that I don't understand computes this? 也许我不理解的库中的(许多)数学函数之一对此进行了计算?

There are really only 2 possible answers to this question: 对于这个问题,实际上只有两个可能的答案:

  1. There's another (better) multi-precision library that works on CUDA that does what you need. 还有另一个(更好的)多精度库可在CUDA上运行,满足您的需求。
  2. Here's how to modify this library to do what you need. 这是修改此库以执行所需操作的方法。

The only people who could give the first answer are CUDA programmers. 唯一可以给出第一个答案的人是CUDA程序员。 Unfortunately, if there were such a library, I feel confident talonmies would have known about it and mentioned it. 不幸的是,如果有这样一个图书馆,我感到有足够的信心会知道并提到它。

As for #2, why would anyone update this library if they weren't a CUDA programmer? 至于#2,如果不是CUDA程序员,为什么有人会更新此库? There are other, much better multi-precision libraries out there. 还有其他更好的多精度库。 The ONLY benefit CAMPARY offers is that it supports CUDA. CAMPARY提供的唯一好处是它支持CUDA。 Which means the only people with any real motivation to work with or modify the library are CUDA programmers. 这意味着CUDA程序员是唯一有使用或修改该库的真正动机的人。

And, as the CUDA programmer with the most vested interest in solving this, I did figure out a solution (albeit an ugly one). 而且,作为对解决这一问题最感兴趣的CUDA程序员,我确实找到了一种解决方案(尽管很丑陋)。 I'm posting it here in the hopes that the information will be of value to future CAMPARY programmers. 我将其发布在这里,希望这些信息对将来的CAMPARY程序员有价值。 There's not much information out there for this library, so this is a start. 该库没有太多信息,所以这是一个开始。


The first thing you need to understand is how CAMPARY stores its data. 您需要了解的第一件事是CAMPARY如何存储其数据。 And, while not complex, it isn't what I expected. 而且,尽管并不复杂,但这并不是我所期望的。 Coming from MPIR, I assumed that CAMPARY stored its data pretty much the same way: a fixed size exponent followed by an arbitrary number of bits for the mantissa. 来自MPIR,我假设CAMPARY以几乎相同的方式存储其数据:固定大小的指数后跟任意数量的尾数位。

But nope, CAMPARY went a different way. 但是没有,CAMPARY采取了不同的方式。 Looking at the code, we see: 查看代码,我们看到:

private:
    double data[prec];

Now, I assumed that this was just an arbitrary way of reserving the number of bits they needed. 现在,我假设这只是保留所需位数的一种任意方法。 But no, they really do use prec doubles. 但是不,他们确实使用了prec双精度。 Like so: 像这样:

multi_prec<8> a("2633716138033644471646729489243748530829179225072491799768019505671233074369063908765111461703117249");

    // Looking at a in the VS debugger:

    [0] 2.6337161380336443e+99  const double
    [1] 1.8496577979210756e+83  const double
    [2] 1.2618399223120249e+67  const double
    [3] -3.5978270144026257e+48 const double
    [4] -1.1764513205926450e+32 const double
    [5] -2479038053160511.0 const double
    [6] 0.00000000000000000 const double
    [7] 0.00000000000000000 const double

So, what they are doing is storing the max amount of precision possible in the first double, then the remainder is used to compute the next double and so on until they encompass the entire value, or run out of precision (dropping the least significant bits). 因此,他们正在做的是在第一个double中存储最大可能的精度,然后使用余数来计算下一个double,依此类推,直到它们包含整个值,或者用完精度(丢弃最低有效位) )。 Note that some of these are negative, which means the sum of the preceding values is a bit bigger than the actual value and they are correcting it downward. 请注意,其中一些是负数,这意味着先前值的总和比实际值大一点,并且它们正在向下修正。

With this in mind, we return to the question of how to print it. 考虑到这一点,我们回到如何打印它的问题。

In theory, you could just add all these together to get the right answer. 从理论上讲,您可以将所有这些加在一起以获得正确的答案。 But kinda by definition, we already know that C doesn't have a datatype to hold a value this size. 但是根据定义,我们已经知道C没有数据类型来保存此大小的值。 But other libraries do (say MPIR). 但是其他图书馆也有(例如MPIR)。 Now, MPIR doesn't work on CUDA, but it doesn't need to. 现在,MPIR不能在CUDA上运行,但不需要。 You don't want to have your CUDA code printing out data. 您不想让CUDA代码打印出数据。 That's something you should be doing from the host anyway. 无论如何,您应该从主机执行此操作。 So do the computations with the full power of CUDA, cudaMemcpy the results back, then use MPIR to print them out: 使用CUDA的全部功能执行计算,将结果cudaMemcpy返回,然后使用MPIR将其打印出来:

#define MPREC 8
void ShowP(const multi_prec<MPREC> value)
{
    multi_prec<MPREC> temp(value), temp2;

    // from mpir at mpir.org
    mpf_t mp, mp2;

    mpf_init2(mp, value.getPrec() * 64); // Make sure we reserve enough room
    mpf_init(mp2); // Only needs to hold one double.

    const double *ptr = value.getData();

    mpf_set_d(mp, ptr[0]);

    for (int x = 1; x < value.getPrec(); x++)
    {
        // MPIR doesn't have a mpf_add_d, so we need to load the value into
        // an mpf_t.
        mpf_set_d(mp2, ptr[x]);
        mpf_add(mp, mp, mp2);
    }

    // Using base 10, write the full precision (0) of mp, to stdout.
    mpf_out_str(stdout, 10, 0, mp); 

    mpf_clears(mp, mp2, NULL);
}

Used with the number stored in the multi_prec above, this outputs the exact same value. 与上面的multi_prec中存储的数字一起使用时,这将输出完全相同的值。 Yay. 好极了。

It's not a particularly elegant solution. 这不是一个特别优雅的解决方案。 Having to add a second library just to print a value from the first is clearly sub-optimal. 显然必须添加第二个库以仅打印第一个库的值是次佳的。 And this conversion can't be all that speedy either. 而且这种转换也不能那么快。 But printing is typically done (much) less frequently than computing. 但是,打印的频率通常比计算的频率要低得多。 If you do an hour's worth of computing and a handful of prints, the performance doesn't much matter. 如果您要花费一个小时的计算时间和少量的打印,那么性能就没什么大不了的。 And it beats the heck out of not being able to print at all. 而且它根本无法打印。

CAMPARY has a lot of shortcomings (undoced, unsupported, unmaintained). CAMPARY有很多缺点(无限制,不受支持,无法维护)。 But for people who need mp numbers on CUDA (especially if you need sqrt), it's the best option I've found. 但是对于需要在CUDA上输入mp编号的人(尤其是如果需要sqrt),这是我找到的最佳选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM