How should GMP/MPFR limbs be interpreted?

Question

The arbitrary precision libraries GMP and MPFR use heap-allocated arrays of machine word-sized integers to store the limbs that make up the high precision number/mantissa.

How should this array of limbs be interpreted to recover the arbitrary precision integer number? In other words: for N limbs holding B bits each, how should I interpret them to recover the N*B bit number?

Does the limb size really affect the in-memory representation (see below)? If so, what is the rationale behind this?

Background:

I wrote a small program to look inside the representation, but I was confused by what I saw. The limbs seem to be ordered in most significant digit order, whereas the limbs themselves are in native least significant digit format. When representing the 64-bit word 0xAAAABBBBCCCCDDDD using 32-bit words and precision fixed to 128 bits, I see:

% c++ limbs.cpp -lgmp -lmpfr -o limbs && ./limbs
ccccdddd|aaaabbbb|00000000|00000000
00000000|00000000|ccccdddd|aaaabbbb

This seems to imply that the in-memory representation can not be read back as a string of bits to recover the arbitrary precision number (eg, if loaded this into a register on a machine that supported N*B sized words). Furthermore, this also seems to suggest that the limb size changes the representation, so that I would not be able to deserialize a number without knowing which limb size was used to serialize it.

Here's my test program (uses 32-bit limbs with the __GMP_SHORT_LIMB macro):

#define __GMP_SHORT_LIMB
#include <gmp.h>
#include <mpfr.h>

#include <iomanip>
#include <iostream>

constexpr int PRECISION = 128;

void PrintLimbs(mp_limb_t const *const limbs) {
  std::cout << std::hex;
  constexpr int NUM_LIMBS = PRECISION / (8 * sizeof(mp_limb_t));
  for (int i = 0; i < NUM_LIMBS; ++i) {
    std::cout << std::setfill('0') << std::setw(2 * sizeof(mp_limb_t)) << limbs[i];
    if (i < NUM_LIMBS - 1) {
      std::cout << "|";
    }
  }
  std::cout << "\n";
}

int main() {
  {  // GMP
    mpz_t num;
    mpz_init2(num, PRECISION);
    mpz_set_ui(num, 0xAAAABBBBCCCCDDDD);
    PrintLimbs(num->_mp_d);
    mpz_clear(num);
  }
  {  // MPFR
    mpfr_t num;
    mpfr_init2(num, PRECISION);
    mpfr_set_ui(num, 0xAAAABBBBCCCCDDDD, MPFR_RNDN);
    PrintLimbs(num->_mpfr_d);
    mpfr_clear(num);
  }
  return 0;
}

Answer 1

3 things that matter for the byte representation:

The limb size depends on your machine and the chosen ABI. The real size is also affected by the optional presence of nails (an experimental feature, thus it is unlikely that limbs have nails). MPFR does not support the presence of nails.
The limb representation in memory follows the endianness of the machine.
Limbs are stored least significant limb first (aka little endian).

Note that from the last two points, on a same big-endian machine, the byte representation of the array will depend on the limb size.

Concerning the size of the array of limbs, it depends on the type. For instance, with the mpn layer of GMP, it is entirely handled by the user.

For MPFR, the size is deduced from the precision of the mpfr_t object; and if the precision is not a multiple of the limb bitsize, the trailing bits are always set to 0. Note also that more memory may be allocated than the one actually used, and it must not be confused with the size of the array; you can ignore this fact, as the unused data are always after the actual array of limbs.

EDIT concerning the rationale: Manipulating limbs instead of bytes is for speed reasons. Then I suppose that little endian has been chosen to represent the array of limbs for two reasons. First, it makes the basic operations (addition, subtraction, multiplication) easier to implement and potentially faster. Second, this is much better to implement arithmetic modulo 2^ K , in particular when K may change.

Answer 2

It finally clicked for me. The limb size does not affect the in-memory representation.

The data in GMP/MPFR is stored consistently in little-endian format, even when interpreted as a string of bytes across limbs. But registers on x86 are big-endian.

The inconsistent outcome when printing the limbs comes from how words are interpreted when read back from memory. When loaded into a register, memory is reinterpreted from little-endian (how it is stored in memory) to big-endian (how it is stored in registers).

I've modified the example below to show how it is in fact the word size with which we reinterpret memory that affects how the content is printed, as the output will be the same no matter if 32-bit or 64-bit limbs are used:

#define __GMP_SHORT_LIMB
#include <gmp.h>
#include <mpfr.h>

#include <iomanip>
#include <iostream>
#include <cstdint>

constexpr int PRECISION = 128;

template <typename InterpretAs>
void PrintLimbs(mp_limb_t const *const limbs) {
  constexpr int LIMB_BITS = 8 * sizeof(InterpretAs); 
  constexpr int NUM_LIMBS = PRECISION / LIMB_BITS;
  std::cout << LIMB_BITS << "-bit: ";
  for (int i = 0; i < NUM_LIMBS; ++i) {
    const auto limb = reinterpret_cast<InterpretAs const *>(limbs)[i];
    for (int b = 0; b < LIMB_BITS; ++b) {
      if (b > 0 && b % 16 == 0) {
        std::cout << " ";
      }
      uint64_t bit = (limb >> (LIMB_BITS - 1 - b)) & 0x1; 
      std::cout << bit; 
    }
    if (i < NUM_LIMBS - 1) {
      std::cout << "|";
    }
  }
  std::cout << "\n";
}

int main() {
  uint64_t literal = 0b1111000000000000000000000000000000000000000000000000000000001001;
  {  // GMP
    mpz_t num;
    mpz_init2(num, PRECISION);
    mpz_set_ui(num, literal);
    std::cout << "GMP where limbs are interpreted as:\n";
    PrintLimbs<uint64_t>(num->_mp_d);
    PrintLimbs<uint32_t>(num->_mp_d);
    PrintLimbs<uint16_t>(num->_mp_d);
    mpz_clear(num);
  }
  {  // MPFR
    mpfr_t num;
    mpfr_init2(num, PRECISION);
    mpfr_set_ui(num, literal, MPFR_RNDN);
    std::cout << "MPFR where limbs are interpreted as:\n";
    PrintLimbs<uint64_t>(num->_mpfr_d);
    PrintLimbs<uint32_t>(num->_mpfr_d);
    PrintLimbs<uint16_t>(num->_mpfr_d);
    mpfr_clear(num);
  }
  return 0;
}

This prints (regardless of limb size):

GMP where limbs are interpreted as:
64-bit: 1111000000000000 0000000000000000 0000000000000000 0000000000001001|0000000000000000 0000000000000000 0000000000000000 0000000000000000
32-bit: 0000000000000000 0000000000001001|1111000000000000 0000000000000000|0000000000000000 0000000000000000|0000000000000000 0000000000000000
16-bit: 0000000000001001|0000000000000000|0000000000000000|1111000000000000|0000000000000000|0000000000000000|0000000000000000|0000000000000000
MPFR where limbs are interpreted as:
64-bit: 0000000000000000 0000000000000000 0000000000000000 0000000000000000|1111000000000000 0000000000000000 0000000000000000 0000000000001001
32-bit: 0000000000000000 0000000000000000|0000000000000000 0000000000000000|0000000000000000 0000000000001001|1111000000000000 0000000000000000
16-bit: 0000000000000000|0000000000000000|0000000000000000|0000000000000000|0000000000001001|0000000000000000|0000000000000000|1111000000000000

How should GMP/MPFR limbs be interpreted?

Question

Background:

2 answers

solution1
1 2021-02-04 11:02:25

solution2
0 ACCPTED 2021-02-05 10:43:12

How should GMP/MPFR limbs be interpreted?

Question

Background:

2 answers

solution1 1 2021-02-04 11:02:25

solution2 0 ACCPTED 2021-02-05 10:43:12

solution1
1 2021-02-04 11:02:25

solution2
0 ACCPTED 2021-02-05 10:43:12