python中的快速64位确定性哈希

Question

I have previously been using a adler32 to produce a 32-bit hash of blocks of text (which i then use as a filename for saving a cache of the processed version of that text).我以前一直在使用adler32来生成文本块的 32 位哈希（然后我将其用作文件名以保存该文本处理版本的缓存）。 eg例如

  hashed_file_name = adler32(pragraph.encode())

I am looking to increase the hash size, to reduce the likelihood of collisions [ie two different blocks of text getting the same hashed-code, which given I have around 10-million text bocks, i think would give a collisions in about 0.2% of cases - ie 10 million / 2^32 ]我希望增加哈希大小，以减少冲突的可能性 [即两个不同的文本块获得相同的哈希代码，鉴于我有大约 1000 万个文本块，我认为会产生大约 0.2% 的冲突案例数 - 即 1000 万 / 2^32 ]

My question is, what is the fastest hash generator which produces at least a 64-bit hash?我的问题是，生成至少 64 位哈希的最快哈希生成器是什么？ Would sha1 be an efficient option [which produces a 160-bit hash]? sha1 会是一个有效的选项[产生 160 位哈希]吗？ ie IE

  hashed_file_name  = hashlib.sha1(pragraph.encode()).hexdigest()

While this is a over-kill for my neads, are other versions/options more efficient in terms of processing time?虽然这对我的 neads 来说太过分了，但其他版本/选项在处理时间方面是否更有效？

Answer 1

MD5 is 128bit, and doesn't add a dependency. MD5 是 128 位，并且不添加依赖项。 It's probably fast enough, although you know your requirements better than I do.它可能足够快，尽管您比我更了解您的要求。 Another thought is to apply your 32-bit hash twice, after permuting the data, say by a XOR or rotation.另一个想法是在排列数据后应用 32 位哈希两次，例如通过 XOR 或旋转。

python中的快速64位确定性哈希

问题描述

1 个解决方案

解决方案1
0 2021-10-30 07:29:30

python中的快速64位确定性哈希

问题描述

1 个解决方案

解决方案1 0 2021-10-30 07:29:30

解决方案1
0 2021-10-30 07:29:30