简体   繁体   English

将大整数转换为十进制字符串

[英]Converting a big integer to decimal string

At the risk of having this question voted as a duplicate, or even to have it closed, I had this question has come up. 如果把这个问题投反对票,或者甚至将其关闭,那么我就有了这个问题。

Background 背景

In "normal" data types such as int, long long, etc..., to convert from the binary numeric value to a decimal string, you would do the following (in pseudo code): 在“正常”数据类型(如int,long long等等)中,要从二进制数值转换为十进制字符串,您将执行以下操作(在伪代码中):

Set length = 0
Set divisor to largest base10 value the data type will hold (Divisor).
  Loop
    Divide number in question by divisor.
    Place result in a string at position length.
    Increment the length by 1.
    Divide the divisor by 10.
Reverse the string.
Print the string.

The actual implementation in (most) any language is quite trivial. (大多数)任何语言的实际实现都是微不足道的。

The Problem 问题

The issue that I am encountering with the above method is that with big integer numbers (also known as arbitrary precision arithmetic ), there is no largest base 10 value to start with. 我遇到上述方法的问题是,对于大整数(也称为任意精度算术 ),没有最大的基数10值开始。 So the question is "How do you initialize the divisor to the largest possible base10 value if there is no way to know what that value is?" 所以问题是“如果无法知道该值是什么,如何将除数初始化为最大可能的base10值?”

What I Have Tried 我曾经尝试过什么

Still trying to draft a solution. 仍在尝试起草解决方案。

Research 研究

Some of the links that I have found here include the following: 我在这里找到的一些链接包括以下内容:

Convert a "big" Hex number (string format) to a decimal number (string format) without BigInteger Class 将“大”十六进制数字(字符串格式)转换为十进制数字(字符串格式),不带BigInteger类

C: print a BigInteger in base 10 C:在基数10中打印一个BigInteger

Fastest way to convert a BigInteger to a decimal (Base 10) string? 将BigInteger转换为十进制(Base 10)字符串的最快方法?

Convert a "big" Hex number (string format) to a decimal number (string format) without BigInteger Class 将“大”十六进制数字(字符串格式)转换为十进制数字(字符串格式),不带BigInteger类

A Google search turned up other things, but nothing that specifically answers my question. 谷歌搜索发现了其他东西,但没有任何具体回答我的问题。

Ideas 思路

One method that I think that might work is as follows (in pseudo code): 我认为可能有效的一种方法如下(伪代码):

Define p_divisor as previous divisor.
Set divisor = 1
  Loop:
    if divisor < dividend
      then
        Set p_divisor = divisor
        divisor = divisor * 10
      else
        end loop
  Loop:
    Divide number in question by divisor.
    Place result in a string at position length.
    Increment the length by 1.
    Divide the divisor by 10.
    if divisor == 1 then end loop
Reverse the string.
Print the string.

Would this be the correct way? 这是正确的方法吗? I have a big integer library up and working (including multiplication and division) so it wouldn't be that hard to pull this off. 我有一个大整数库和工作(包括乘法和除法)所以它不会那么难以实现。 The big issue that I see with this method is performance, because you have to run a multiplication sequence to get the initial divisor, then you have to divide twice for each base10 position. 我在这个方法中看到的一个大问题是性能,因为你必须运行一个乘法序列来得到初始除数,然后你必须为每个base10位置分两次。 One for the actual division, and the other for the divisor. 一个用于实际除法,另一个用于除数。

One (fairly common) way to do this, whether for big integer or normal integer types, is to repeatedly divide the number by 10, saving the remainder as the next digit (starting with the least significant). 无论是大整数还是普通整数类型,一种(相当常见的)方法是重复将数字除以10,将余数保存为下一个数字(从最低有效位开始)。 Keep going until the number reaches zero. 继续前进,直到数字达到零。 Since the first digit found is the least significant, you may need to reverse the string at the end, or build it in reverse as you go. 由于找到的第一个数字是最不重要的,您可能需要在结尾处反转字符串,或者在您开始时反向构建它。

An example using ordinary unsigned int might look like: 使用普通unsigned int的示例可能如下所示:

void printUInt(unsigned x) {
  char buf[(sizeof(x) * CHAR_BIT) / 3 + 2]; // slightly oversize buffer
  char *result  = buf + sizeof(buf) - 1; // index of next output digit

  // add digits to result, starting at 
  //   the end (least significant digit)

  *result = '\0'; // terminating null
  do {
    *--result = '0' + (x % 10);  // remainder gives the next digit
    x /= 10;
  } while (x); // keep going until x reaches zero

  puts(result);
}

The process is pretty much the same for a big integer -- though it would be best to do the division and find the remainder in one step if you can. 对于一个大整数,这个过程几乎是一样的 - 尽管如果可以的话,最好进行除法并一步找到余数。

The above example builds the string from the end of the buffer (so result ends up pointing in the middle of the buffer somewhere), but you could also build it from the start and reverse it afterward. 上面的示例从缓冲区的末尾构建字符串(因此result最终指向缓冲区的中间位置),但您也可以从开始构建它并在之后反转它。

You can estimate the size needed for the output if you can determine the number of bits used in your original number (about 1 additional digit per 3 bits -- slightly less). 如果可以确定原始编号中使用的位数(每3位大约1个附加位数 - 稍微少一些),则可以估计输出所需的大小。

The accepted answer already provides you with a simple way to do this. 已接受的答案已经为您提供了一种简单的方法。 That works fine and gives you a nice result. 这很好,并给你一个很好的结果。 However, if you really need to convert large values to a string, there is a better way. 但是,如果您确实需要将大值转换为字符串,则有更好的方法。

I will not go into details, because my solution is written in Delphi, which many readers can't easily read, and it is pretty long (several functions in 100+ lines of code, using yet other functions, etc. which can not be explained in a simple answer, especially because the conversion handles some different number bases differently). 我不会详细介绍,因为我的解决方案是用Delphi编写的,许多读者都不能轻易阅读,而且它很长(100多行代码中的几个函数,使用其他函数等等,不能在一个简单的答案中解释,特别是因为转换处理不同的数字基数不同)。

But the principle is to divide the number into two almost equal size halves, by a number which is a power of 10. To convert these, recursivley cut them in two smaller parts again, by a smaller power of 10, etc. until the size of the parts reaches some kind of lower limit (say, 32 bit), which you then finally convert the conventional way, ie like in the accepted answer. 但原则是将数字分成两个几乎相等大小的一半,乘以一个10的幂。为了转换它们,recursivley再次将它们分成两个较小的部分,用较小的10次幂等等,直到大小为止。部件达到某种下限(比方说,32位),然后你最终转换传统方式,即在接受的答案中。

The partial conversions are then "concatenated" (actually, the digits are placed into the single buffer at the correct address directly), so at the end, you get one huge string of digits. 然后,部分转换被“连接”(实际上,数字直接放在正确地址的单个缓冲区中),所以最后,你得到一个巨大的数字串。

This is a bit tricky, and I only mention it for those who want to investigate this for extremely large numbers. 这有点棘手,我只提到那些想要对极大数字进行调查的人。 It doesn't make sense for numbers with fewer than, say, 100 digits. 对于数量少于100位的数字没有意义。

This is a recursive method, indeed, but not one that simply divides by 10. 实际上,这是一种递归方法,但不是简单地除以10的方法。

The size of the buffer can be precalculated, by doing something like 通过做类似的事情,可以预先计算缓冲区的大小

bufSize = myBigInt.bitCount() * Math.log10(2) + some_extra_to_be_sure;

I use a precalculated table for the different number bases, but that is an implementation detail. 我使用预先计算的表来表示不同的数字基础,但这是一个实现细节。

For very large numbers, this will be much faster than a loop that repeatedly divides by 10, especially since that way, the entire number must be divided by 10 all the time, and it only gets smaller very slowly. 对于非常大的数字,这将是比一个循环反复除以10 快得多 ,特别是因为这样的话,整个号码必须由10所有的时间进行划分,并且它只是变得更小的非常缓慢。 The divide-and-conquer algorithm only divides ever smaller numbers, and the total number of (costly) divisions to cut the parts is far lower (log N instead of N, is my guess). 分而治之的算法只划分越来越少的数字,并且用于削减部分的(昂贵的)划分总数要低得多(log N而不是N,是我的猜测)。 So fewer divisions on (on the average) much smaller numbers. 所以(平均而言)的数量减少得多。

cf. 比照 Brent, Zimmermann, "Modern Computer Arithmetic", algorithm 1.26 布伦特,齐默尔曼,“现代计算机算术”,算法1.26

My code and explanations can be found here, if you want to see how it works: BigIntegers unit 如果你想看看它是如何工作的,我的代码和解释可以在这里找到: BigIntegers单元

I came across similar problem and did not find any solution to my liking, so came up with my owm. 我遇到了类似的问题,并没有找到任何我喜欢的解决方案,所以想出了我的自己。 The idea is to convert your BigInt using whatever base to another BigInt with the base of power of 10 , as large as possible but still smaller then your current base. 我们的想法是将你的BigInt使用任何基数转换为另一个BigInt ,功率为10 ,尽可能大但仍小于你当前的基数。 That you can just convert by "digit" using system calls, and concatenate the result. 您可以使用系统调用通过“数字”转换,并连接结果。 So no explicit division ever involved, only hidden in system library functions. 所以没有涉及明确的划分,只隐藏在系统库函数中。 Still the overall complexity is quadratic (just like with the other division based solutions). 总体复杂性仍然是二次的(就像其他基于分部的解决方案一样)。

friend std::ostream& operator<<(std::ostream& out, const BigInt_impl& x){
    using Big10 = BigInt_impl<char32_t, uint64_t, 1000000000>; // 1e9 is the max power of 10 smaller then BASE
    auto big10 = Big10(0);
    auto cm = Big10(1);
    for(size_t i = 0; i < x.digits.size(); ++i, cm *= BASE){
        big10 += cm*x.digits[i];
    }
    out << big10.digits.back();
    for(auto it = next(big10.digits.rbegin()); it != big10.digits.rend(); ++it){ 
        out << std::setfill('0') << std::setw(9) << *it;
    }
    return out;
}

Watch out for the magic constant 1e9 in this solution - this is just for my case of BASE = 2^32 . 注意这个解决方案中的魔法常数1e9 - 这只是我的BASE = 2^32 Was lazy to do it properly. 懒得做得好。

(and sorry, for C++, I just realized that qustion was about C, but still would like to leave the code here, maybe as an illustration of idea) (对不起,对于C ++,我只是意识到qustion是关于C的,但仍然希望将代码留在这里,也许是为了说明想法)

Would this be the correct way? 这是正确的方法吗?

2nd method does not work for all integer values in C. if divisor < dividend relies on creating divisor as a power of 10 greater (or equal) than the dividend . 第二种方法不适用于C中的所有整数值。 if divisor < dividend依赖于将divisor设为比dividend大10(或等于)10的幂。 Since most integer systems have a finite range, creating a power of 10 greater (or equal) than dividend when dividend == INTEGER_MAX is not possible. 由于大多数整数系统具有有限范围,因此当不能使用dividend == INTEGER_MAX时,创建比dividend大10(或相等)10的幂。 (unless INTEGER_MAX is a power of 10). (除非INTEGER_MAX是10的幂)。


A recursive method works by performing repeated division by 10 and deferring the the digit assignment until the more significant digits are determined. 递归方法通过重复除以10并推迟数字赋值直到确定更有效的数字来工作。 This approach works well when the size of the destination buffer is unknown, yet adequate. 当目标缓冲区的大小未知但足够时,此方法很有效。

The below handles signed int and works for INT_MIN too without undefined behavior. 下面处理signed int并且也适用于INT_MIN而没有未定义的行为。

// Return location of next char to write
// Note: value is expected to be <= 0
static char *itoa_helper(char *s, int value) {
  if (value/10) {
    s = itoa_helper(s, value/10);
  }
  *s = '0' - value % 10;  // C99
  return s+1;
}

void itoa(int n, char *s) {
  if (n < 0) {
    *s++ = '-';
  } else {
    n = -n;
  }
  *itoa_helper(s, n) = '\0';
}

#define INT_SIZEMAX  ((CHAR_BIT*sizeof(int) - 1)*28/93 + 3)
char buf[INT_SIZEMAX];
itoa(INT_MIN, buf);

Rather than converting negative numbers to positive ones, this code does the opposite as -INT_MIN fails on most systems. 这个代码不是将负数转换为正数,而是与大多数系统上的-INT_MIN失败相反。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM