简体   繁体   English

在键值存储中使用哈希作为 ID

[英]Using hashes as IDs in key-value stores

I'm wondering whether it would be a good idea to use hashes (CityHash, Murmur and the like) as keys in a key-value store like Hazelcast.我想知道在像 Hazelcast 这样的键值存储中使用哈希(CityHash、Murmur 等)作为键是否是个好主意。 I'm expecting to have about 2,000,000,000 records (URLs) in the database, so collisions could happen.我预计数据库中有大约 2,000,000,000 条记录(URL),因此可能会发生冲突。 It wouldn't be super critical to lose some data through hash collisions, but of course it would be best to avoid them.通过哈希冲突丢失一些数据并不是非常重要,但当然最好避免它们。

A record contains the URL, time stamp, status code.一条记录包含 URL、时间戳、状态代码。 The main operations are inserting and looking up whether an URL already exists.主要操作是插入和查找 URL 是否已经存在。

So, what would you suggest, given speed is relevant:那么,鉴于速度是相关的,您有什么建议:

  • using anID generator , or使用ID 生成器,或
  • using a hash algorithm like CityHash or Murmur, or使用散列算法,如 CityHash 或 Murmur,或
  • using the relevant string, an URL in this case, itself?使用相关的字符串,在这种情况下是一个 URL,本身?

Hazelcast does not rely on hashCode/equals methods of the key object, instead it is using the MurMur hash of the binary representation of the key. Hazelcast 不依赖于密钥对象的 hashCode/equals 方法,而是使用密钥的二进制表示的 MurMur 哈希。

In short, you should not really worry about hash collisions.简而言之,您不应该真正担心哈希冲突。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM