简体繁体 English

为什么 std::hash 不能保证是确定性的？

[英]Why is std::hash not guaranteed to be deterministic?

原文 2020-03-06 16:34:37 7 2 c++/ hash/ language-lawyer/ std

Hereafter, we useN4140 (C++14 Standard).此后，我们使用N4140 （C++14 标准）。

According to § 17.6.3.4 Hash requirements ,根据§ 17.6.3.4 哈希要求，

The value returned shall depend only on the argument k for the duration of the program .在程序运行期间，返回值应仅取决于参数k 。

[ Note: Thus all evaluations of the expression h(k) with the same value for k yield the same result for a given execution of the program . [注意：因此，对于给定的程序执行，对具有相同k值的表达式h(k)所有评估都会产生相同的结果。 — end note ] — 尾注 ]

and § 20.9.12 Class template hash says和§ 20.9.12 类模板哈希说

... ...

the instantiation hash<Key> shall:实例化hash<Key>应：

(1.1) — satisfy the Hash requirements (17.6.3.4) ... (1.1) — 满足哈希要求 (17.6.3.4) ...

(1.2) — ... (1.2) — ...

This means a hash value of value (ie hash<decltype(value)>(value) ) may take a different value if you restart the program.这意味着如果重新启动程序， value的哈希值（即hash<decltype(value)>(value) ）可能会采用不同的值。

But why?但为什么？ This limitation was not in the Standard of C++11, but in the Standard of C++14, C++17 and C++20.这个限制不在 C++11 的标准中，而是在 C++14、C++17 和 C++20 的标准中。 As a user (not a STL developer), it would be quite useful if std::hash were deterministic.作为用户（不是 STL 开发人员），如果std::hash是确定性的，那将非常有用。 Are there any mathematical difficulties in implementing a deterministic hash function?在实现确定性散列函数时有什么数学上的困难吗？ But hash functions we daily use (eg deprecated md5sum or safer sha256 ) are all deterministic.但是我们日常使用的散列函数（例如已弃用的md5sum或更安全的sha256 ）都是确定性的。 Is there a problem of efficiency?效率有问题吗？

2 个解决方案

There is no need for the hash function to be deterministic between runs, but you can still provide your own hash, eg for unordered containers if it's a behavior you rely on.散列函数不需要在运行之间具有确定性，但您仍然可以提供自己的散列，例如对于无序容器，如果它是您依赖的行为。

As for why, cppreference says:至于为什么， cppreference说：

Hash functions are only required to produce the same result for the same input within a single execution of a program;哈希函数只需要在程序的单次执行中为相同的输入产生相同的结果； this allows salted hashes that prevent collision denial-of-service attacks.这允许防止碰撞拒绝服务攻击的加盐哈希。

If the Hash requirements tells it to be deterministic, then you wouldn't be able to provide a salted hash without breaking the requirement.如果Hash要求告诉它是确定性的，那么您将无法在不违反要求的情况下提供加盐哈希。

Here is the actual explanation why这是原因的实际解释

This answer (and links in it) suggested by @NathanOliver is ultimately helpful. @NathanOliver建议的这个答案（及其中的链接）最终很有帮助。 Let me cite important parts.让我引用重要的部分。

For a non-cryptographic hash function, it's possible to pre-calculate massive inputs with the same hashed value to algorithmically slow down the unordered containers, and results in a denial-of-service attack.对于非加密散列函数，可以使用相同的散列值预先计算大量输入，以通过算法减慢无序容器的速度，并导致拒绝服务攻击。

(from Issue 2291. std::hash is vulnerable to collision DoS attack ) （来自问题 2291。std::hash 容易受到碰撞 DoS 攻击）

For this reason, language designers are migrating to random hashing.出于这个原因，语言设计者正在迁移到随机散列。 In random hashing, the hash value of the string “a” can change every time you run your program.在随机散列中，每次运行程序时，字符串“a”的散列值都会改变。 Random hashing is now the default in Python (as of version 3.3), Ruby (as of version 1.9) and Perl (as of version 5.18).随机散列现在是 Python（从 3.3 版开始）、Ruby（从 1.9 版开始）和 Perl（从 5.18 版开始）的默认设置。

(from Do you realize that you are using random hashing? ) （来自您是否意识到您正在使用随机散列？ ）

Move to Ready, rather than Immediate, as even the permission has been contentious in reflector discussion移动到就绪，而不是立即，因为在反射器讨论中，即使是许可也有争议

(from Issue 2291. std::hash is vulnerable to collision DoS attack ) （来自问题 2291。std::hash 容易受到碰撞 DoS 攻击）

In practice, as far as I understand, no implementation of std::hash implements random hashing but you can write your own my::secure_hash .在实践中，据我所知，没有std::hash实现实现随机散列，但您可以编写自己的my::secure_hash 。

(from this answer ) （来自这个答案）