[英]How to implement the Hashable Protocol in Swift for an Int array (a custom string struct)
I am making a structure that acts like a String
, except that it only deals with Unicode UTF-32 scalar values. 我正在制作一个类似于
String
的结构,只是它只处理Unicode UTF-32标量值。 Thus, it is an array of UInt32
. 因此,它是
UInt32
的数组。 (See this question for more background.) (有关更多背景,请参阅此问题 。)
I want to be able to use my custom ScalarString
struct as a key in a dictionary. 我希望能够将自定义
ScalarString
结构用作字典中的键。 For example: 例如:
var suffixDictionary = [ScalarString: ScalarString]() // Unicode key, rendered glyph value
// populate dictionary
suffixDictionary[keyScalarString] = valueScalarString
// ...
// check if dictionary contains Unicode scalar string key
if let renderedSuffix = suffixDictionary[unicodeScalarString] {
// do something with value
}
In order to do that, ScalarString
needs to implement the Hashable Protocol . 为此,
ScalarString
需要实现Hashable Protocol 。 I thought I would be able to do something like this: 我以为我可以做这样的事情:
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
var hashValue : Int {
get {
return self.scalarArray.hashValue // error
}
}
}
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.hashValue == right.hashValue
}
but then I discovered that Swift arrays don't have a hashValue
. 但是后来我发现Swift数组没有
hashValue
。
The article Strategies for Implementing the Hashable Protocol in Swift had a lot of great ideas, but I didn't see any that seemed like they would work well in this case. 《在Swift中实现哈希协议的策略》一书中有很多很棒的主意,但是我看不出有什么方法可以在这种情况下很好地工作。 Specifically,
特别,
hashValue
) hashValue
) Here are some other things I read: 这是我阅读的其他内容:
Swift Strings have a hashValue
property, so I know it is possible to do. Swift字符串具有
hashValue
属性,因此我知道可以做到这一点。
How would I create a hashValue
for my custom structure? 如何为自定义结构创建
hashValue
?
Update 1: I would like to do something that does not involve converting to String
and then using String
's hashValue
. 更新1:我想做一些不涉及转换为
String
然后使用String
的hashValue
。 My whole point for making my own structure was so that I could avoid doing lots of String
conversions. 创建我自己的结构的全部目的是为了避免进行很多
String
转换。 String
gets it's hashValue
from somewhere. String
从某处获取其hashValue
。 It seems like I could get it using the same method. 看来我可以使用相同的方法来获得它。
Update 2: I've been looking into the implementation of string hash codes algorithms from other contexts. 更新2:我一直在研究其他上下文中字符串哈希码算法的实现。 I'm having a little difficulty knowing which is best and expressing them in Swift, though.
不过,我很难知道哪种方法最好,并用Swift来表达它们。
hashCode
algorithm hashCode
算法 Update 3 更新3
I would prefer not to import any external frameworks unless that is the recommended way to go for these things. 我宁愿不要导入任何外部框架,除非这是推荐用于这些事情的方法。
I submitted a possible solution using the DJB Hash Function. 我使用DJB哈希函数提交了可能的解决方案。
As of Swift 4.1 , the compiler can synthesize
Equatable
andHashable
for types conformance automatically, if all members conform to Equatable/Hashable (SE0185).从Swift 4.1开始 ,如果所有成员都符合Equatable / Hashable(SE0185),则编译器可以自动合成
Equatable
和Hashable
以实现类型一致性。 And as of Swift 4.2 , a high-quality hash combiner is built-in into the Swift standard library (SE-0206).从Swift 4.2开始 ,Swift标准库(SE-0206)中内置了一个高质量的哈希组合器。
Therefore there is no need anymore to define your own hashing function, it suffices to declare the conformance:
因此,不再需要定义自己的哈希函数,只需声明一致性即可:
struct ScalarString: Hashable, ... { private var scalarArray: [UInt32] = [] // ... }
Thus, the answer below needs to be rewritten (yet again). 因此,下面的答案需要重写(再次)。 Until that happens refer to Martin R's answer from the link above.
在此之前,请从上面的链接中参考Martin R的答案。
This answer has been completely rewritten after submitting my original answer to code review . 将我的原始答案提交给代码审查后,该答案已被完全重写。
The Hashable protocol allows you to use your custom class or struct as a dictionary key. 哈希协议允许您将自定义类或结构用作字典键。 In order to implement this protocol you need to
为了实施此协议,您需要
hashValue
hashValue
These points follow from the axiom given in the documentation: 这些要点来自文档中给出的公理:
x == y
impliesx.hashValue == y.hashValue
x == y
表示x.hashValue == y.hashValue
where x
and y
are values of some Type. 其中
x
和y
是某种类型的值。
In order to implement the Equatable protocol, you define how your type uses the ==
(equivalence) operator. 为了实现Equatable协议,您定义类型如何使用
==
(等效)运算符。 In your example, equivalence can be determined like this: 在您的示例中,等效性可以这样确定:
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
The ==
function is global so it goes outside of your class or struct. ==
函数是全局的,因此它超出了您的类或结构。
hashValue
hashValue
Your custom class or struct must also have a computed hashValue
variable. 您的自定义类或结构还必须具有计算
hashValue
变量。 A good hash algorithm will provide a wide range of hash values. 一个好的哈希算法将提供广泛的哈希值。 However, it should be noted that you do not need to guarantee that the hash values are all unique.
但是,应注意,您不必保证哈希值都是唯一的。 When two different values have identical hash values, this is called a hash collision.
当两个不同的值具有相同的哈希值时,这称为哈希冲突。 It requires some extra work when there is a collision (which is why a good distribution is desirable), but some collisions are to be expected.
发生冲突时,这需要一些额外的工作(这就是为什么需要良好的分布)的原因,但是某些冲突是可以预期的。 As I understand it, the
==
function does that extra work. 据我了解,
==
函数可以完成额外的工作。 ( Update : It looks like ==
may do all the work. ) ( 更新 : 看来
==
可以完成所有工作。 )
There are a number of ways to calculate the hash value. 有多种计算哈希值的方法。 For example, you could do something as simple as returning the number of elements in the array.
例如,您可以做一些简单的事情,就像返回数组中的元素数一样。
var hashValue: Int {
return self.scalarArray.count
}
This would give a hash collision every time two arrays had the same number of elements but different values. 每当两个数组具有相同数量的元素但值不同时,就会产生哈希冲突。
NSArray
apparently uses this approach. NSArray
显然使用了这种方法。
DJB Hash Function DJB哈希函数
A common hash function that works with strings is the DJB hash function. DJB哈希函数是与字符串一起使用的常见哈希函数。 This is the one I will be using, but check out some others here .
这是我将要使用的那个,但是在这里请查看其他一些。
A Swift implementation provided by @MartinR follows: @MartinR提供的 Swift实现如下:
var hashValue: Int {
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
This is an improved version of my original implementation, but let me also include the older expanded form, which may be more readable for people not familiar with reduce
. 这是我原始实现的改进版本,但让我也包括了较旧的扩展形式,对于不熟悉
reduce
人们可能更可读。 This is equivalent, I believe: 我认为这是等效的:
var hashValue: Int {
// DJB Hash Function
var hash = 5381
for(var i = 0; i < self.scalarArray.count; i++)
{
hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
}
return hash
}
The &+
operator allows Int
to overflow and start over again for long strings. &+
运算符允许Int
溢出并重新为长字符串重新开始。
We have looked at the pieces, but let me now show the whole example code as it relates to the Hashable protocol. 我们已经看过各个部分,但现在让我展示与哈希协议相关的整个示例代码。
ScalarString
is the custom type from the question. ScalarString
是问题中的自定义类型。 This will be different for different people, of course. 当然,这对于不同的人来说是不同的。
// Include the Hashable keyword after the class/struct name
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
// required var for the Hashable protocol
var hashValue: Int {
// DJB hash function
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
}
// required function for the Equatable protocol, which Hashable inheirits from
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
A big thanks to Martin R over in Code Review. 非常感谢Code Review中的MartinR。 My rewrite is largely based on his answer .
我的改写主要是基于他的回答 。 If you found this helpful, then please give him an upvote.
如果您觉得有帮助,请给他点赞。
Swift is open source now so it is possible to see how hashValue
is implemented for String
from the source code . Swift现在是开源的,因此可以从源代码中了解如何为
String
实现hashValue
。 It appears to be more complex than the answer I have given here, and I have not taken the time to analyze it fully. 它似乎比我在这里给出的答案更为复杂,并且我还没有花时间对它进行全面分析。 Feel free to do so yourself.
自己动手做。
It is not a very elegant solution but it works nicely: 这不是一个很好的解决方案,但效果很好:
"\(scalarArray)".hashValue
or 要么
scalarArray.description.hashValue
Which just uses the textual representation as a hash source 只是使用文本表示作为哈希源
Edit (31 May '17): Please refer to the accepted answer. 编辑(17年5月31日):请参阅接受的答案。 This answer is pretty much just a demonstration on how to use the
CommonCrypto
Framework 这个答案几乎只是关于如何使用
CommonCrypto
框架的演示。
Okay, I got ahead and extended all arrays with the Hashable
protocol by using the SHA-256 hashing algorithm from the CommonCrypto framework. 好的,我取得了成功,并通过使用CommonCrypto框架中的SHA-256哈希算法,使用
Hashable
协议扩展了所有数组。 You have to put 你必须把
#import <CommonCrypto/CommonDigest.h>
into your bridging header for this to work. 到您的桥接头中,以使其正常工作。 It's a shame that pointers have to be used though:
遗憾的是必须使用指针:
extension Array : Hashable, Equatable {
public var hashValue : Int {
var hash = [Int](count: Int(CC_SHA256_DIGEST_LENGTH) / sizeof(Int), repeatedValue: 0)
withUnsafeBufferPointer { ptr in
hash.withUnsafeMutableBufferPointer { (inout hPtr: UnsafeMutableBufferPointer<Int>) -> Void in
CC_SHA256(UnsafePointer<Void>(ptr.baseAddress), CC_LONG(count * sizeof(Element)), UnsafeMutablePointer<UInt8>(hPtr.baseAddress))
}
}
return hash[0]
}
}
Edit (31 May '17): Don't do this, even though SHA256 has pretty much no hash collisions, it's the wrong idea to define equality by hash equality 编辑(17年5月31日):即使SHA256几乎没有哈希冲突,也不要这样做,但是通过哈希相等来定义相等是错误的想法
public func ==<T>(lhs: [T], rhs: [T]) -> Bool {
return lhs.hashValue == rhs.hashValue
}
This is as good as it gets with CommonCrypto
. 这和
CommonCrypto
一样好。 It's ugly, but fast and
not many
pretty much no hash collisions for sure 这是丑陋的,但速度快,
没有多少
几乎没有哈希冲突肯定
Edit (15 July '15): I just made some speed tests: 编辑(15年7月15日):我刚刚进行了一些速度测试:
Randomly filled Int
arrays of size n took on average over 1000 runs 大小为n的随机填充的
Int
数组平均运行1000次以上
n -> time
1000 -> 0.000037 s
10000 -> 0.000379 s
100000 -> 0.003402 s
Whereas with the string hashing method: 而使用字符串哈希方法:
n -> time
1000 -> 0.001359 s
10000 -> 0.011036 s
100000 -> 0.122177 s
So the SHA-256 way is about 33 times faster than the string way. 因此,SHA-256方式比字符串方式快33倍。 I'm not saying that using a string is a very good solution, but it's the only one we can compare it to right now
我并不是说使用字符串是一个很好的解决方案,但这是我们唯一可以与之比较的解决方案
One suggestion - since you are modeling a String
, would it work to convert your [UInt32]
array to a String
and use the String
's hashValue
? 一个建议-由于您正在建模一个
String
,将[UInt32]
数组转换为String
并使用String
的hashValue
是否hashValue
? Like this: 像这样:
var hashValue : Int {
get {
return String(self.scalarArray.map { UnicodeScalar($0) }).hashValue
}
}
That could conveniently allow you to compare your custom struct
against String
s as well, though whether or not that is a good idea depends on what you are trying to do... 这可以方便地使您也将自定义
struct
与String
进行比较,尽管这是否是一个好主意取决于您要执行的操作...
Note also that, using this approach, instances of ScalarString
would have the same hashValue
if their String
representations were canonically equivalent, which may or may not be what you desire. 还要注意,使用这种方法,如果
ScalarString
实例的String
表示形式是规范等效的,则它们将具有相同的hashValue
,这可能是您想要的,也可能不是您想要的。
So I suppose that if you want the hashValue
to represent a unique String
, my approach would be good. 因此,我想如果您希望
hashValue
表示一个唯一的String
,那么我的方法会很好。 If you want the hashValue
to represent a unique sequence of UInt32
values, @Kametrixom's answer is the way to go... 如果您希望
hashValue
表示UInt32
值的唯一序列,则@Kametrixom的答案就是解决方法...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.