简体   繁体   English

在C / C ++中有效地转换Hex,Binary和Decimal

[英]Efficiently convert between Hex, Binary, and Decimal in C/C++

I have 3 base representations for positive integer numbers: 我有正整数的3个基本表示:

  1. Decimal, in unsigned long variable (eg unsigned long int NumDec = 200 ). 十进制,无符号长变量(例如unsigned long int NumDec = 200 )。
  2. Hex, in string variable (eg string NumHex = "C8" ) 十六进制,字符串变量(例如字符串NumHex =“C8”
  3. Binary, in string variable (eg string NumBin = "11001000" ) 二进制,字符串变量(例如字符串NumBin =“11001000”

I want to be able to convert between numbers in all 3 representations in the most efficient way. 我希望能够以最有效的方式在所有3个表示中的数字之间进行转换。 Ie to implement the following 6 functions: 即实现以下6个功能:

unsigned long int Binary2Dec(const string & Bin) {}
unsigned long int Hex2Dec(const string & Hex) {}
string Dec2Hex(unsigned long int Dec) {}
string Binary2Hex(const string & Bin) {}
string Dec2Binary(unsigned long int Dec) {}
string Hex2Binary(const string & Hex) {}

What is the most efficient approach for each of them? 每种方法最有效的方法是什么? I can use C and C++, but not boost. 我可以使用C和C ++,但不能提升。

Edit: By "efficiency" I mean time efficiency: Shortest execution time. 编辑:“效率”是指时间效率:最短的执行时间。

As others have pointed out, I would start with sscanf() , printf() and/or strtoul() . 正如其他人指出的那样,我将从sscanf()printf()和/或strtoul() They are fast enough for most applications, and they are less likely to have bugs. 它们对于大多数应用程序来说足够快,并且它们不太可能有bug。 I will say, however, that these functions are more generic than you might expect, as they have to deal with non-ASCII character sets, with numbers represented in any base and so forth. 但是,我会说,这些函数比你想象的更通用,因为它们必须处理非ASCII字符集,数字表示在任何基数等等。 For some domains it is possible to beat the library functions. 对于某些域,可以击败库函数。

So, measure first, and if the performance of these conversion is really an issue, then: 因此,首先测量,如果这些转换的性能确实是一个问题,那么:

1) In some applications / domains certain numbers appear very often, for example zero, 100, 200, 19.95, may be so common that it makes sense to optimize your functions to convert such numbers with a bunch of if() statements, and then fall back to the generic library functions. 1)在某些应用程序/域中,某些数字经常出现,例如零,100,200,19.95,可能是如此常见,以至于优化函数以使用一堆if()语句转换这些数字是有意义的,然后回到通用库函数。 2) Use a table lookup if the most common 100 numbers, and then fall back on a library function. 2)如果最常见的100个数字使用表查找,然后回退到库函数。 Remember that large tables may not fit in your cache and may require multiple indirections for shared libraries, so measure these things carefully to make sure you are not decreasing performance. 请记住,大型表可能不适合您的缓存,并且可能需要多个共享库的间接性,因此请仔细测量这些内容以确保不降低性能。

You may also want to look at boost lexical_cast functions, though in my experience the latter are relatively compared to the good old C functions. 您可能还想查看boost lexical_cast函数,但根据我的经验,后者与较好的旧C函数相比较。

Tough many have said it, it is worth repeating over and over: do not optimize these conversions until you have evidence that they are a problem. 很多人都说过,它值得一遍又一遍地重复:在你有证据表明它们存在问题之前,不要优化这些转换。 If you do optimize, measure your new implementation to make sure it is faster and make sure you have a ton of unit tests for your own version, because you will introduce bugs :-( 如果你做了优化,测量你的新实现,以确保它更快并确保你有自己的版本的大量单元测试,因为你将引入错误:-(

I would suggest just using sprintf and sscanf . 我建议只使用sprintfsscanf

Also, if you're interested in how it's implemented you can take a look at the source code for glibc, the GNU C Library . 另外,如果您对它的实现方式感兴趣,可以查看glibc源代码 ,即GNU C库

Why do these routines have to be so time-efficient? 为什么这些例程必须如此节省时间? That sort of claim always makes me wonder. 那种说法总是让我怀疑。 Are you sure the obvious conversion methods like strtol() are too slow, or that you can do better? 你确定像strtol()这样明显的转换方法太慢,或者你可以做得更好吗? System functions are usually pretty efficient. 系统功能通常非常有效。 They are sometimes slower to support generality and error-checking, but you need to consider what to do with errors. 它们有时较慢,无法支持通用性和错误检查,但您需要考虑如何处理错误。 If a bin argument has characters other than '0' and '1', what then? 如果bin参数的字符不是'0'和'1',那么呢? Abort? 中止? Propagate massive errors? 传播大量错误?

Why are you using "Dec" to represent the internal representation? 你为什么用“Dec”代表内部代表? Dec, Hex, and Bin should be used to refer to the string representations. Dec,Hex和Bin应该用于表示字符串表示。 There's nothing decimal about an unsigned long . unsigned long整数没有小数。 Are you dealing with strings showing the number in decimal? 你在处理显示十进制数字的字符串吗? If not, you're confusing people here and are going to confuse many more. 如果没有,那么你在这里会让人感到困惑,并且会让更多人感到困惑。

The transformation between binary and hex text formats can be done quickly and efficiently, with lookup tables, but anything involving decimal text format will be more complicated. 二进制和十六进制文本格式之间的转换可以使用查找表快速有效地完成,但任何涉及十进制文本格式的内容都将更加复杂。

That depends on what you're optimizing for, what do you mean by "efficient"? 这取决于你所优化的是什么,“高效”是什么意思? Is it important that the conversions be fast, use little memory, little programmer time, fewer WTFs from other programmers reading the code, or what? 重要的是转换速度快,占用内存少,程序员时间少,读取代码的其他程序员的WTF少,或者什么?

For readability and ease of implementation, you should at least implement both Dec2Hex() and Dec2Binary() by just calling strotul() . 对于可读性和易于实施的,你至少应该实现这两个Dec2Hex()Dec2Binary()被调用刚刚strotul() That makes them into one-liners, which is very efficient for at least some of the above interpretations of the word. 这使得它们成为单行,这对于该词的至少一些上述解释是非常有效的。

Sounds very much like a homework problem, but what the heck... 听起来很像家庭作业问题,但是到底是什么......

The short answer is for converting from long int to your strings use two lookup tables. 简短的回答是从long int转换为字符串使用两个查找表。 Each table should have 256 entries. 每个表应该有256个条目。 One maps a byte to a hex string: 0 -> "00", 1 -> "01", etc. The other maps a byte to a bit string: 0 -> "00000000", 1 -> "00000001". 一个字节映射到十六进制字符串:0 - >“00”,1 - >“01”等。另一个将字节映射到位串:0 - >“00000000”,1 - >“00000001”。

Then for each byte in your long int you just have to look up the correct string, and concatenate them. 然后对于long int中的每个字节,您只需要查找正确的字符串,然后将它们连接起来。

To convert from strings back to long you can simply convert the hex string and the bit string back to a decimal number by multiplying the numeric value of each character by the appropriate power of 16 or 2, and summing up the results. 要将字符串转换回long,您只需将十六进制字符串和位字符串转换回十进制数,方法是将每个字符的数值乘以16或2的适当幂,然后将结果相加。

EDIT: You can also use the same lookup tables for backwards conversion by doing binary search to find the right string. 编辑:您也可以通过二进制搜索找到正确的字符串,使用相同的查找表进行向后转换。 This would take log(256) = 8 comparisons of your strings. 这将采用log(256)= 8比较你的字符串。 Unfortunately I don't have time to do the analysis whether comparing strings would be much faster than multiplying and adding integers. 不幸的是,我没有时间进行分析,比较字符串是否比乘法和添加整数要快得多。

Let's think about half of task for a moment - converting from a string-ized base n to unsigned long, where n is a power of 2 (base 2 for binary and base 16 for hex). 让我们暂时考虑一半的任务 - 从字符串化的基数n转换为无符号长整数,其中n是2的幂(二进制的基数为2,十六进制的基数为16)。

If your input is sane, then this work is nothing more than a compare, a subract, a shift and an or per digit. 如果你的输入是理智的,那么这项工作只不过是比较,减法,转移和一个或每个数字。 If your input is not sane, well, that's where it gets ugly, doesn't it? 如果你的意见不是很明智,那就是它变得丑陋的地方,不是吗? Doing the conversion superfast is not hard. 做转换超快并不难。 Doing it well under all circumstances is the challenge. 在任何情况下都做得好是挑战。

So let's assume that your input is sane, then the heart of your conversion is this: 所以我们假设您的输入是理智的,那么转换的核心就是:

unsigned long PowerOfTwoFromString(char *input, int shift)
{
    unsigned long val = 0;
    char upperLimit = 'a' + (1 << shift)
    while (*input) {
        char c = tolower(*input++);
        unsigned long digit = (c > 'a' && c < upperLimit) ? c - 'a' + 10 : c - '0';
        val = (val << shift) | digit;
    }
    return val;
 }

 #define UlongFromBinaryString(str) PowerOfTwoFromString(str, 1)
 #define UlongFromHexString(str) PowerOfTwoFromString(str, 4)

See how easy that is? 看看这有多容易? And it will fail on non-sane inputs. 并且它会在不合理的输入上失败。 Most of your work is going to go into making your input sane, not performance. 你的大部分工作都是为了让你的输入更加清晰,而不是表现。

Now, this code takes advantage of power of two shifting. 现在,这段代码利用了两次移位的功能。 It's easy to extend to base 4, base 8, base 32, etc. It won't work on non-power of two bases. 它很容易扩展到基座4,基座8,基座32等。它不适用于两个基座的非功率。 For those, your math has to change. 对于那些,你的数学必须改变。 You get 你得到

val = (val * base) + digit

which is conceptually the same for this set of operations. 这组操作在概念上是相同的。 The multiplication by the base is going to be equivalent to the shift. 基数的乘法将等同于移位。 So I'd be as likely to use a fully general routine instead. 所以我很可能会使用完全通用的例程。 And sanitize the code while sanitizing the inputs. 并在清理输入的同时清理代码。 And at that point, strtoul is probably your best bet. 那时候,strtoul可能是你最好的选择。 Here's a link to a version of strtoul. 这是strtoul 版本的链接。 Nearly all the work is handling edge conditions - that should clue you in on where you energies should be focused: correct, resilient code. 几乎所有的工作都是处理边缘条件 - 这应该让你知道你应该集中注意力的地方:正确,有弹性的代码。 The savings for using bit shifts is going to be minimal compared to the savings of say, not crashing on bad input. 与节省的费用相比,使用位移的节省将是最小的,而不会因输入错误而崩溃。

Why not just use a Macro to also take the format as an input. 为什么不使用宏来将格式作为输入。 If you are in C at least. 如果你至少在C。

#define TO_STRING( string, format, data) \
sprintf( string, "##format##", data)
// Int
TO_STRING(buf,%d,i);
// Hex ( Two char representation )
TO_STRING(buf,%02x,i);
// Binary
TO_STRING(buf,%b,i);

Or you can use sprintf directly: Or you can have multiple macroes. 或者您可以直接使用sprintf:或者您可以使用多个宏。

#define INT_STRING( buf, data) \
sprintf( buf, "%d", data)
#define HEX_STRING( buf, data) \
sprintf( buf, "%x", data)
#define BIN_TO_STRING( buf, data) \
sprintf( buf, "%b", data)

BIN_TO_STRING( loc_buf, my_bin );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM