简体   繁体   English

Java / Scala中的高性能字符串散列函数

[英]High performance string hashing function in Java/Scala

Looking for a high-performance String hashing functions in Java/Scala - something faster than functions from MurmurHash family, doesn't need to be cryptographically strong, only distribute well. 在Java / Scala中寻找高性能的字符串散列函数 - 比MurmurHash系列的函数更快,不需要加密强大,只能很好地分发。

Any suggestions? 有什么建议?

The fastest hashing algorithm that fits the bill presently seems to be xxHash . 适合该法案的最快哈希算法似乎是xxHash The lz4-java project contains an implementation ported to Java . lz4-java项目包含一个移植到Java实现 I don't know whether the Java implementation has been benchmarked against MurmurHash, though; 我不知道Java实现是否已经针对MurmurHash进行了基准测试; performance optimizations in C++ don't always port to/from Java. C ++中的性能优化并不总是移植到/来自Java。 (In particular, xxHash contains more array access, so there could be non-negligible bounds-checking overhead.) (特别是,xxHash包含更多的数组访问,因此可能存在不可忽略的边界检查开销。)

Edit: it looks to me like it uses JNI to call the C++ implementation of xxHash, but JNI overhead is non-negligible, so the performance concerns remain. 编辑:它看起来像我使用JNI来调用xxHash的C ++实现,但JNI开销是不可忽略的,因此性能问题仍然存在。

However, given that Scala includes a MurmurHash function , and that Java contains a faster default hash (about 2x) that is sorta-reasonably distributed sometimes, one does wonder whether it's really necessary. 但是,鉴于Scala 包含一个MurmurHash函数 ,并且Java包含一个更快的默认哈希值(大约2倍),有时可以合理地分布,人们确实想知道它是否真的有必要。 For instance, scala.util.hashing.MurmurHash3 is about as fast as string creation from an array of bytes, and is twice as fast as that if you give it an array of bytes. 例如, scala.util.hashing.MurmurHash3与从字节数组创建字符串的速度一样快,如果给它一个字节数组,速度是它的两倍。

你可以找到非常快速的Java哈希函数实现,BTW帐户内部的String实现( char[]数组)以最大化速度,这里: https//github.com/OpenHFT/Zero-Allocation-Hashing

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM