简体   繁体   English

应该如何解释 GMP/MPFR 肢体?

[英]How should GMP/MPFR limbs be interpreted?

The arbitrary precision libraries GMP and MPFR use heap-allocated arrays of machine word-sized integers to store the limbs that make up the high precision number/mantissa.任意精度库 GMP 和 MPFR 使用机器字大小整数的堆分配 arrays 来存储构成高精度数字/尾数的肢体。

How should this array of limbs be interpreted to recover the arbitrary precision integer number?应该如何解释这个肢体数组来恢复任意精度的 integer 数? In other words: for N limbs holding B bits each, how should I interpret them to recover the N*B bit number?换句话说:对于每个持有 B 位的 N 个肢体,我应该如何解释它们以恢复 N*B 位数?

Does the limb size really affect the in-memory representation (see below)?肢体大小真的会影响内存中的表示吗(见下文)? If so, what is the rationale behind this?如果是这样,这背后的理由是什么?


Background:背景:

I wrote a small program to look inside the representation, but I was confused by what I saw.我编写了一个小程序来查看表示的内部,但我对所看到的感到困惑。 The limbs seem to be ordered in most significant digit order, whereas the limbs themselves are in native least significant digit format.四肢似乎以最高有效数字顺序排列,而四肢本身采用本机最低有效数字格式。 When representing the 64-bit word 0xAAAABBBBCCCCDDDD using 32-bit words and precision fixed to 128 bits, I see:当使用 32 位字和固定为 128 位的精度表示 64 位字0xAAAABBBBCCCCDDDD时,我看到:

% c++ limbs.cpp -lgmp -lmpfr -o limbs && ./limbs
ccccdddd|aaaabbbb|00000000|00000000
00000000|00000000|ccccdddd|aaaabbbb

This seems to imply that the in-memory representation can not be read back as a string of bits to recover the arbitrary precision number (eg, if loaded this into a register on a machine that supported N*B sized words).这似乎意味着内存中的表示不能作为位串读回以恢复任意精度数(例如,如果将其加载到支持 N*B 大小字的机器上的寄存器中)。 Furthermore, this also seems to suggest that the limb size changes the representation, so that I would not be able to deserialize a number without knowing which limb size was used to serialize it.此外,这似乎也表明肢体大小改变了表示,因此我无法在不知道使用哪个肢体大小对其进行序列化的情况下反序列化一个数字。

Here's my test program (uses 32-bit limbs with the __GMP_SHORT_LIMB macro):这是我的测试程序(使用带有__GMP_SHORT_LIMB宏的 32 位肢体):

#define __GMP_SHORT_LIMB
#include <gmp.h>
#include <mpfr.h>

#include <iomanip>
#include <iostream>

constexpr int PRECISION = 128;

void PrintLimbs(mp_limb_t const *const limbs) {
  std::cout << std::hex;
  constexpr int NUM_LIMBS = PRECISION / (8 * sizeof(mp_limb_t));
  for (int i = 0; i < NUM_LIMBS; ++i) {
    std::cout << std::setfill('0') << std::setw(2 * sizeof(mp_limb_t)) << limbs[i];
    if (i < NUM_LIMBS - 1) {
      std::cout << "|";
    }
  }
  std::cout << "\n";
}

int main() {
  {  // GMP
    mpz_t num;
    mpz_init2(num, PRECISION);
    mpz_set_ui(num, 0xAAAABBBBCCCCDDDD);
    PrintLimbs(num->_mp_d);
    mpz_clear(num);
  }
  {  // MPFR
    mpfr_t num;
    mpfr_init2(num, PRECISION);
    mpfr_set_ui(num, 0xAAAABBBBCCCCDDDD, MPFR_RNDN);
    PrintLimbs(num->_mpfr_d);
    mpfr_clear(num);
  }
  return 0;
}

3 things that matter for the byte representation:对字节表示很重要的 3 件事:

  • The limb size depends on your machine and the chosen ABI.肢体大小取决于您的机器和选择的 ABI。 The real size is also affected by the optional presence of nails (an experimental feature, thus it is unlikely that limbs have nails).实际尺寸也受到可选的指甲的影响(实验特征,因此四肢不太可能有指甲)。 MPFR does not support the presence of nails. MPFR 不支持钉子的存在。
  • The limb representation in memory follows the endianness of the machine. memory 中的肢体表示遵循机器的字节顺序。
  • Limbs are stored least significant limb first (aka little endian).肢体首先存储最不重要的肢体(又名小端)。

Note that from the last two points, on a same big-endian machine, the byte representation of the array will depend on the limb size.请注意,从最后两点来看,在同一台大端机器上,数组的字节表示将取决于肢体大小。

Concerning the size of the array of limbs, it depends on the type.关于肢体数组的大小,取决于类型。 For instance, with the mpn layer of GMP, it is entirely handled by the user.比如GMP的mpn层,完全由用户处理。

For MPFR, the size is deduced from the precision of the mpfr_t object;对于 MPFR,大小是从mpfr_t object 的精度推导出来的; and if the precision is not a multiple of the limb bitsize, the trailing bits are always set to 0. Note also that more memory may be allocated than the one actually used, and it must not be confused with the size of the array;如果精度不是肢体位大小的倍数,则尾随位始终设置为 0。还要注意,分配的 memory 可能比实际使用的多,并且不能与数组的大小混淆; you can ignore this fact, as the unused data are always after the actual array of limbs.你可以忽略这个事实,因为未使用的数据总是在实际的肢体数组之后。

EDIT concerning the rationale: Manipulating limbs instead of bytes is for speed reasons.编辑关于基本原理:操纵肢体而不是字节是出于速度原因。 Then I suppose that little endian has been chosen to represent the array of limbs for two reasons.然后我认为选择小端来表示肢体数组有两个原因。 First, it makes the basic operations (addition, subtraction, multiplication) easier to implement and potentially faster.首先,它使基本运算(加法、减法、乘法)更容易实现并且可能更快。 Second, this is much better to implement arithmetic modulo 2^ K , in particular when K may change.其次,这更好地实现算术模 2^ K ,特别是当K可能改变时。

It finally clicked for me.它终于为我点击了。 The limb size does not affect the in-memory representation.肢体大小影响内存中的表示。

The data in GMP/MPFR is stored consistently in little-endian format, even when interpreted as a string of bytes across limbs. GMP/MPFR中的数据始终以little-endian格式存储,即使被解释为跨分支的字节串。 But registers on x86 are big-endian.但是 x86 上的寄存器是大端的。

The inconsistent outcome when printing the limbs comes from how words are interpreted when read back from memory.打印肢体时的不一致结果来自从 memory 回读时如何解释单词。 When loaded into a register, memory is reinterpreted from little-endian (how it is stored in memory) to big-endian (how it is stored in registers).当加载到寄存器中时,memory 从小端(如何存储在内存中)重新解释为大端(如何存储在寄存器中)。

I've modified the example below to show how it is in fact the word size with which we reinterpret memory that affects how the content is printed, as the output will be the same no matter if 32-bit or 64-bit limbs are used:我已经修改了下面的示例,以显示实际上是我们重新解释 memory 的字长会影响内容的打印方式,因为无论使用 32 位或 64 位肢体,output 都是相同的:

#define __GMP_SHORT_LIMB
#include <gmp.h>
#include <mpfr.h>

#include <iomanip>
#include <iostream>
#include <cstdint>

constexpr int PRECISION = 128;

template <typename InterpretAs>
void PrintLimbs(mp_limb_t const *const limbs) {
  constexpr int LIMB_BITS = 8 * sizeof(InterpretAs); 
  constexpr int NUM_LIMBS = PRECISION / LIMB_BITS;
  std::cout << LIMB_BITS << "-bit: ";
  for (int i = 0; i < NUM_LIMBS; ++i) {
    const auto limb = reinterpret_cast<InterpretAs const *>(limbs)[i];
    for (int b = 0; b < LIMB_BITS; ++b) {
      if (b > 0 && b % 16 == 0) {
        std::cout << " ";
      }
      uint64_t bit = (limb >> (LIMB_BITS - 1 - b)) & 0x1; 
      std::cout << bit; 
    }
    if (i < NUM_LIMBS - 1) {
      std::cout << "|";
    }
  }
  std::cout << "\n";
}

int main() {
  uint64_t literal = 0b1111000000000000000000000000000000000000000000000000000000001001;
  {  // GMP
    mpz_t num;
    mpz_init2(num, PRECISION);
    mpz_set_ui(num, literal);
    std::cout << "GMP where limbs are interpreted as:\n";
    PrintLimbs<uint64_t>(num->_mp_d);
    PrintLimbs<uint32_t>(num->_mp_d);
    PrintLimbs<uint16_t>(num->_mp_d);
    mpz_clear(num);
  }
  {  // MPFR
    mpfr_t num;
    mpfr_init2(num, PRECISION);
    mpfr_set_ui(num, literal, MPFR_RNDN);
    std::cout << "MPFR where limbs are interpreted as:\n";
    PrintLimbs<uint64_t>(num->_mpfr_d);
    PrintLimbs<uint32_t>(num->_mpfr_d);
    PrintLimbs<uint16_t>(num->_mpfr_d);
    mpfr_clear(num);
  }
  return 0;
}

This prints (regardless of limb size):这打印(无论肢体大小):

GMP where limbs are interpreted as:
64-bit: 1111000000000000 0000000000000000 0000000000000000 0000000000001001|0000000000000000 0000000000000000 0000000000000000 0000000000000000
32-bit: 0000000000000000 0000000000001001|1111000000000000 0000000000000000|0000000000000000 0000000000000000|0000000000000000 0000000000000000
16-bit: 0000000000001001|0000000000000000|0000000000000000|1111000000000000|0000000000000000|0000000000000000|0000000000000000|0000000000000000
MPFR where limbs are interpreted as:
64-bit: 0000000000000000 0000000000000000 0000000000000000 0000000000000000|1111000000000000 0000000000000000 0000000000000000 0000000000001001
32-bit: 0000000000000000 0000000000000000|0000000000000000 0000000000000000|0000000000000000 0000000000001001|1111000000000000 0000000000000000
16-bit: 0000000000000000|0000000000000000|0000000000000000|0000000000000000|0000000000001001|0000000000000000|0000000000000000|1111000000000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM