简体   繁体   English

python中的快速64位确定性哈希

[英]Quick 64-bit deterministic hash in python

I have previously been using a adler32 to produce a 32-bit hash of blocks of text (which i then use as a filename for saving a cache of the processed version of that text).我以前一直在使用adler32来生成文本块的 32 位哈希(然后我将其用作文件名以保存该文本处理版本的缓存)。 eg例如

  hashed_file_name = adler32(pragraph.encode())

I am looking to increase the hash size, to reduce the likelihood of collisions [ie two different blocks of text getting the same hashed-code, which given I have around 10-million text bocks, i think would give a collisions in about 0.2% of cases - ie 10 million / 2^32 ]我希望增加哈希大小,以减少冲突的可能性 [即两个不同的文本块获得相同的哈希代码,鉴于我有大约 1000 万个文本块,我认为会产生大约 0.2% 的冲突案例数 - 即 1000 万 / 2^32 ]

My question is, what is the fastest hash generator which produces at least a 64-bit hash?我的问题是,生成至少 64 位哈希的最快哈希生成器是什么? Would sha1 be an efficient option [which produces a 160-bit hash]? sha1 会是一个有效的选项[产生 160 位哈希]吗? ie IE

  hashed_file_name  = hashlib.sha1(pragraph.encode()).hexdigest()

While this is a over-kill for my neads, are other versions/options more efficient in terms of processing time?虽然这对我的 neads 来说太过分了,但其他版本/选项在处理时间方面是否更有效?

MD5 is 128bit, and doesn't add a dependency. MD5 是 128 位,并且不添加依赖项。 It's probably fast enough, although you know your requirements better than I do.它可能足够快,尽管您比我更了解您的要求。 Another thought is to apply your 32-bit hash twice, after permuting the data, say by a XOR or rotation.另一个想法是在排列数据后应用 32 位哈希两次,例如通过 XOR 或旋转。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM