简体繁体 English

使用64位类型？

[英]Use of 64-bit types?

原文 2011-01-12 17:49:07 5 4 c++/ 64-bit/ hash/ int64

I am writing some hash functions for a compiler and I use the __int64 datatype frequently. 我正在为编译器编写一些哈希函数，并且经常使用__int64数据类型。 The compiler is intended to be supported (and so far is) on different OS's. 打算在不同的OS上支持该编译器（到目前为止）。 I know that __int64 is a type that can be compiled by most major C++ compilers for my target systems so that's not the problem. 我知道__int64是大多数主要C ++编译器都可以为我的目标系统编译的类型，所以这不是问题。 I am using hash functions to make large character strings smaller and quicker to compare and they work wonders on 64-bit capable OS's; 我正在使用哈希函数使大型字符串更小，更快速地进行比较，并且它们在支持64位操作系统的操作系统上产生了奇迹。 but would there be a large enough performance decrease on 32 bit OS's to cancel out the benefits? 但是32位操作系统的性能下降是否足以抵消其优势？ I could use 32 bit integers but then it would greatly lessen the effectiveness of the hash functions. 我可以使用32位整数，但是那样会大大降低哈希函数的有效性。

Edit: It is custom code and very simple. 编辑：这是自定义代码，非常简单。 The first hash function generates a unique 64-bit int from 12 alphanumeric (including underscore) characters. 第一个哈希函数从12个字母数字（包括下划线）字符生成唯一的64位int。 Then a class handles hashes over 12 characters by creating address-linked lists of 64bit hashes and overloads the comparison operators. 然后，一个类通过创建地址链接的64位哈希表来处理超过12个字符的哈希表，并使比较运算符重载。 The overloaded compares are short circuited and compare down the address-linked list. 过载的比较被短路，并在地址链接列表中进行比较。 I've ran tests on my machine to compare speed of randomly generate large hashes (100 - 300 characters) compared to themselves (worst-case senario) and it proved to be faster than string compares. 我已经在我的机器上进行了测试，以比较随机生成的大哈希（100-300个字符）与自己（最坏情况的senario）的速度，并且事实证明它比字符串比较要快。 In order to better simulate the overhead of generating hashes, I've also ran compare tests of pre-generated large hashes compares against them selves. 为了更好地模拟生成哈希的开销，我还针对他们自己进行了预生成的大型哈希比较的比较测试。 This is all running with code optimization turned off. 这一切都在关闭代码优化的情况下进行。 With ~1 billion hash compares vs. ~1 billion string compares, the hash took around 16% of the time. 哈希比较的结果约为10亿，字符串比较的结果约为10亿，哈希大约花费了16％的时间。 This was all in a 64 environment though. 但这全是在64位环境中。 I don't have a 32-bit machine to run tests with 我没有32位计算机可以运行测试

4 个解决方案

64bit sized integers aren't substantially slower at all on a 32bit x86 architecture. 在32位x86架构上，64位大小的整数根本没有变慢。 They're not as fast as 32bit ints, obviously, but aren't notably slower. 显然，它们的速度不及32位整数，但并不明显慢。 It's not at all reckless to use a 64bit int for hashes regardless of x86 or x64. 无论使用x86还是x64，使用64位int进行哈希运算都并非鲁re。 The additional overhead will likely be minimal compared to say, a couple of unneeded dynamic allocations or failed algorithms. 与说几个不需要的动态分配或失败的算法相比，额外的开销可能很小。

I don't think that comparing four 32-bit variables will be faster than comparing two 64-bit variables, since I guess the compiler will generate the fastest code: if your processor doesn't support 64-bit operations, your compiler will generate code that compares it in two steps, just like you would do by hand. 我认为比较四个32位变量不会比比较两个64位变量快，因为我猜编译器会生成最快的代码：如果您的处理器不支持64位操作，则编译器会生成分两步进行比较的代码，就像您手工完成一样。
This of course depends on your compiler. 当然，这取决于您的编译器。

Anyway, there are other tools that will make your comparisons even faster, but which are not available everywhere, for example vectorial operations (provided by SSE extensions) that allow to compare even 8*4 bytes at once. 无论如何，还有其他一些工具可以使您的比较更快，但并不是到处都可以使用，例如矢量运算（由SSE扩展提供）可以一次比较8 * 4字节。

If you need to optimize your code as much as possible I'd suggest you to add some preprocessor directives in order to enable optimizations only when the system supports them. 如果您需要尽可能优化代码，我建议您添加一些预处理器指令，以便仅在系统支持它们时才启用优化。

Are you sure it would greatly lessen the effectiveness of the hash function? 您确定会大大降低哈希函数的有效性吗？ Have you run tests? 你进行测试了吗？ Certainly 64 bits is a better hash than 32 bits if (i) the number of items hashed is significantly more than 2^16 and (ii) computing the 64-bit hash is cheap. 如果（i）散列的项目数明显大于2 ^ 16，并且（ii）计算64位散列的价格便宜，则肯定是64位比32位更好。 Which of (i) or (ii) (or both) is true in your case? （i）或（ii）（或两者）中的哪一个在您的情况下是正确的？ If performance is important, you might want to use different hash functions depending on the underlying operating system. 如果性能很重要，则可能需要根据基础操作系统使用不同的哈希函数。 Otherwise, I would say: write a 32-bit version, and a 64-bit version; 否则，我会说：写一个32位版本和一个64位版本； try them both out on a 64-bit system, and a 32-bit system; 在64位系统和32位系统上都尝试它们； and you'll see whether it's worth busting a gut over. 然后您会发现是否值得彻底解决。