简体   繁体   English

如何在Swift中为Int数组(自定义字符串结构)实现哈希协议

[英]How to implement the Hashable Protocol in Swift for an Int array (a custom string struct)

I am making a structure that acts like a String , except that it only deals with Unicode UTF-32 scalar values. 我正在制作一个类似于String的结构,只是它只处理Unicode UTF-32标量值。 Thus, it is an array of UInt32 . 因此,它是UInt32的数组。 (See this question for more background.) (有关更多背景,请参阅此问题 。)

What I want to do 我想做的事

I want to be able to use my custom ScalarString struct as a key in a dictionary. 我希望能够将自定义ScalarString结构用作字典中的键。 For example: 例如:

var suffixDictionary = [ScalarString: ScalarString]() // Unicode key, rendered glyph value

// populate dictionary
suffixDictionary[keyScalarString] = valueScalarString
// ...

// check if dictionary contains Unicode scalar string key
if let renderedSuffix = suffixDictionary[unicodeScalarString] {
    // do something with value
}

Problem 问题

In order to do that, ScalarString needs to implement the Hashable Protocol . 为此, ScalarString需要实现Hashable Protocol I thought I would be able to do something like this: 我以为我可以做这样的事情:

struct ScalarString: Hashable {

    private var scalarArray: [UInt32] = []

    var hashValue : Int {
        get {
            return self.scalarArray.hashValue // error
        }
    }
}

func ==(left: ScalarString, right: ScalarString) -> Bool {
    return left.hashValue == right.hashValue
}

but then I discovered that Swift arrays don't have a hashValue . 但是后来我发现Swift数组没有hashValue

What I read 我读了什么

The article Strategies for Implementing the Hashable Protocol in Swift had a lot of great ideas, but I didn't see any that seemed like they would work well in this case. 《在Swift中实现哈希协议的策略》一书中有很多很棒的主意,但是我看不出有什么方法可以在这种情况下很好地工作。 Specifically, 特别,

  • Object property (array is does not have hashValue ) 对象属性 (数组是没有hashValue
  • ID property (not sure how this could be implemented well) ID属性 (不确定如何很好地实现)
  • Formula (seems like any formula for a string of 32 bit integers would be processor heavy and have lots of integer overflow) 公式 (似乎任何32位整数字符串的公式都会占用大量处理器资源,并且有很多整数溢出)
  • ObjectIdentifier (I'm using a struct, not a class) ObjectIdentifier (我使用的是结构,而不是类)
  • Inheriting from NSObject (I'm using a struct, not a class) 从NSObject继承 (我使用的是结构,而不是类)

Here are some other things I read: 这是我阅读的其他内容:

Question

Swift Strings have a hashValue property, so I know it is possible to do. Swift字符串具有hashValue属性,因此我知道可以做到这一点。

How would I create a hashValue for my custom structure? 如何为自定义结构创建hashValue

Updates 更新

Update 1: I would like to do something that does not involve converting to String and then using String 's hashValue . 更新1:我想做一些不涉及转换为String然后使用StringhashValue My whole point for making my own structure was so that I could avoid doing lots of String conversions. 创建我自己的结构的全部目的是为了避免进行很多String转换。 String gets it's hashValue from somewhere. String从某处获取其hashValue It seems like I could get it using the same method. 看来我可以使用相同的方法来获得它。

Update 2: I've been looking into the implementation of string hash codes algorithms from other contexts. 更新2:我一直在研究其他上下文中字符串哈希码算法的实现。 I'm having a little difficulty knowing which is best and expressing them in Swift, though. 不过,我很难知道哪种方法最好,并用Swift来表达它们。

Update 3 更新3

I would prefer not to import any external frameworks unless that is the recommended way to go for these things. 我宁愿不要导入任何外部框架,除非这是推荐用于这些事情的方法。

I submitted a possible solution using the DJB Hash Function. 我使用DJB哈希函数提交了可能的解决方案。

Update 更新资料

Martin R writes : 马丁·R 写道

As of Swift 4.1 , the compiler can synthesize Equatable and Hashable for types conformance automatically, if all members conform to Equatable/Hashable (SE0185). Swift 4.1开始 ,如果所有成员都符合Equatable / Hashable(SE0185),则编译器可以自动合成EquatableHashable以实现类型一致性。 And as of Swift 4.2 , a high-quality hash combiner is built-in into the Swift standard library (SE-0206). Swift 4.2开始 ,Swift标准库(SE-0206)中内置了一个高质量的哈希组合器。

Therefore there is no need anymore to define your own hashing function, it suffices to declare the conformance: 因此,不再需要定义自己的哈希函数,只需声明一致性即可:

 struct ScalarString: Hashable, ... { private var scalarArray: [UInt32] = [] // ... } 

Thus, the answer below needs to be rewritten (yet again). 因此,下面的答案需要重写(再次)。 Until that happens refer to Martin R's answer from the link above. 在此之前,请从上面的链接中参考Martin R的答案。


Old Answer: 旧答案:

This answer has been completely rewritten after submitting my original answer to code review . 将我的原始答案提交给代码审查后,该答案已被完全重写。

How to implement to Hashable protocol 如何实现到哈希协议

The Hashable protocol allows you to use your custom class or struct as a dictionary key. 哈希协议允许您将自定义类或结构用作字典键。 In order to implement this protocol you need to 为了实施此协议,您需要

  1. Implement the Equatable protocol (Hashable inherits from Equatable) 实现Equatable协议 (Hashable继承自Equatable)
  2. Return a computed hashValue 返回计算的hashValue

These points follow from the axiom given in the documentation: 这些要点来自文档中给出的公理:

x == y implies x.hashValue == y.hashValue x == y表示x.hashValue == y.hashValue

where x and y are values of some Type. 其中xy是某种类型的值。

Implement the Equatable protocol 实施平等协议

In order to implement the Equatable protocol, you define how your type uses the == (equivalence) operator. 为了实现Equatable协议,您定义类型如何使用== (等效)运算符。 In your example, equivalence can be determined like this: 在您的示例中,等效性可以这样确定:

func ==(left: ScalarString, right: ScalarString) -> Bool {
    return left.scalarArray == right.scalarArray
}

The == function is global so it goes outside of your class or struct. ==函数是全局的,因此它超出了您的类或结构。

Return a computed hashValue 返回计算的hashValue

Your custom class or struct must also have a computed hashValue variable. 您的自定义类或结构还必须具有计算hashValue变量。 A good hash algorithm will provide a wide range of hash values. 一个好的哈希算法将提供广泛的哈希值。 However, it should be noted that you do not need to guarantee that the hash values are all unique. 但是,应注意,您不必保证哈希值都是唯一的。 When two different values have identical hash values, this is called a hash collision. 当两个不同的值具有相同的哈希值时,这称为哈希冲突。 It requires some extra work when there is a collision (which is why a good distribution is desirable), but some collisions are to be expected. 发生冲突时,这需要一些额外的工作(这就是为什么需要良好的分布)的原因,但是某些冲突是可以预期的。 As I understand it, the == function does that extra work. 据我了解, ==函数可以完成额外的工作。 ( Update : It looks like == may do all the work. ) 更新看来==可以完成所有工作。

There are a number of ways to calculate the hash value. 有多种计算哈希值的方法。 For example, you could do something as simple as returning the number of elements in the array. 例如,您可以做一些简单的事情,就像返回数组中的元素数一样。

var hashValue: Int {
    return self.scalarArray.count
} 

This would give a hash collision every time two arrays had the same number of elements but different values. 每当两个数组具有相同数量的元素但值不同时,就会产生哈希冲突。 NSArray apparently uses this approach. NSArray显然使用了这种方法。

DJB Hash Function DJB哈希函数

A common hash function that works with strings is the DJB hash function. DJB哈希函数是与字符串一起使用的常见哈希函数。 This is the one I will be using, but check out some others here . 这是我将要使用的那个,但是在这里请查看其他一些。

A Swift implementation provided by @MartinR follows: @MartinR提供的 Swift实现如下:

var hashValue: Int {
    return self.scalarArray.reduce(5381) {
        ($0 << 5) &+ $0 &+ Int($1)
    }
}

This is an improved version of my original implementation, but let me also include the older expanded form, which may be more readable for people not familiar with reduce . 这是我原始实现的改进版本,但让我也包括了较旧的扩展形式,对于不熟悉reduce人们可能更可读。 This is equivalent, I believe: 我认为这是等效的:

var hashValue: Int {

    // DJB Hash Function
    var hash = 5381

    for(var i = 0; i < self.scalarArray.count; i++)
    {
        hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
    }

    return hash
} 

The &+ operator allows Int to overflow and start over again for long strings. &+运算符允许Int溢出并重新为长字符串重新开始。

Big Picture 大图景

We have looked at the pieces, but let me now show the whole example code as it relates to the Hashable protocol. 我们已经看过各个部分,但现在让我展示与哈希协议相关的整个示例代码。 ScalarString is the custom type from the question. ScalarString是问题中的自定义类型。 This will be different for different people, of course. 当然,这对于不同的人来说是不同的。

// Include the Hashable keyword after the class/struct name
struct ScalarString: Hashable {

    private var scalarArray: [UInt32] = []

    // required var for the Hashable protocol
    var hashValue: Int {
        // DJB hash function
        return self.scalarArray.reduce(5381) {
            ($0 << 5) &+ $0 &+ Int($1)
        }
    }
}

// required function for the Equatable protocol, which Hashable inheirits from
func ==(left: ScalarString, right: ScalarString) -> Bool {
    return left.scalarArray == right.scalarArray
}

Other helpful reading 其他有用的阅读

Credits 学分

A big thanks to Martin R over in Code Review. 非常感谢Code Review中的MartinR。 My rewrite is largely based on his answer . 我的改写主要是基于他的回答 If you found this helpful, then please give him an upvote. 如果您觉得有帮助,请给他点赞。

Update 更新资料

Swift is open source now so it is possible to see how hashValue is implemented for String from the source code . Swift现在是开源的,因此可以从源代码中了解如何为String实现hashValue It appears to be more complex than the answer I have given here, and I have not taken the time to analyze it fully. 它似乎比我在这里给出的答案更为复杂,并且我还没有花时间对它进行全面分析。 Feel free to do so yourself. 自己动手做。

It is not a very elegant solution but it works nicely: 这不是一个很好的解决方案,但效果很好:

"\(scalarArray)".hashValue

or 要么

scalarArray.description.hashValue

Which just uses the textual representation as a hash source 只是使用文本表示作为哈希源

Edit (31 May '17): Please refer to the accepted answer. 编辑(17年5月31日):请参阅接受的答案。 This answer is pretty much just a demonstration on how to use the CommonCrypto Framework 这个答案几乎只是关于如何使用CommonCrypto框架的演示。

Okay, I got ahead and extended all arrays with the Hashable protocol by using the SHA-256 hashing algorithm from the CommonCrypto framework. 好的,我取得了成功,并通过使用CommonCrypto框架中的SHA-256哈希算法,使用Hashable协议扩展了所有数组。 You have to put 你必须把

#import <CommonCrypto/CommonDigest.h>

into your bridging header for this to work. 到您的桥接头中,以使其正常工作。 It's a shame that pointers have to be used though: 遗憾的是必须使用指针:

extension Array : Hashable, Equatable {
    public var hashValue : Int {
        var hash = [Int](count: Int(CC_SHA256_DIGEST_LENGTH) / sizeof(Int), repeatedValue: 0)
        withUnsafeBufferPointer { ptr in
            hash.withUnsafeMutableBufferPointer { (inout hPtr: UnsafeMutableBufferPointer<Int>) -> Void in
                CC_SHA256(UnsafePointer<Void>(ptr.baseAddress), CC_LONG(count * sizeof(Element)), UnsafeMutablePointer<UInt8>(hPtr.baseAddress))
            }
        }

        return hash[0]
    }
}

Edit (31 May '17): Don't do this, even though SHA256 has pretty much no hash collisions, it's the wrong idea to define equality by hash equality 编辑(17年5月31日):即使SHA256几乎没有哈希冲突,也不要这样做,但是通过哈希相等来定义相等是错误的想法

public func ==<T>(lhs: [T], rhs: [T]) -> Bool {
    return lhs.hashValue == rhs.hashValue
}

This is as good as it gets with CommonCrypto . 这和CommonCrypto一样好。 It's ugly, but fast and not many pretty much no hash collisions for sure 这是丑陋的,但速度快, 没有多少 几乎没有哈希冲突肯定

Edit (15 July '15): I just made some speed tests: 编辑(15年7月15日):我刚刚进行了一些速度测试:

Randomly filled Int arrays of size n took on average over 1000 runs 大小为n的随机填充的Int数组平均运行1000次以上

n      -> time
1000   -> 0.000037 s
10000  -> 0.000379 s
100000 -> 0.003402 s

Whereas with the string hashing method: 而使用字符串哈希方法:

n      -> time
1000   -> 0.001359 s
10000  -> 0.011036 s
100000 -> 0.122177 s

So the SHA-256 way is about 33 times faster than the string way. 因此,SHA-256方式比字符串方式快33倍。 I'm not saying that using a string is a very good solution, but it's the only one we can compare it to right now 我并不是说使用字符串是一个很好的解决方案,但这是我们唯一可以与之比较的解决方案

One suggestion - since you are modeling a String , would it work to convert your [UInt32] array to a String and use the String 's hashValue ? 一个建议-由于您正在建模一个String ,将[UInt32]数组转换为String并使用StringhashValue是否hashValue Like this: 像这样:

var hashValue : Int {
    get {
        return String(self.scalarArray.map { UnicodeScalar($0) }).hashValue
    }
}

That could conveniently allow you to compare your custom struct against String s as well, though whether or not that is a good idea depends on what you are trying to do... 这可以方便地使您也将自定义structString进行比较,尽管这是否是一个好主意取决于您要执行的操作...

Note also that, using this approach, instances of ScalarString would have the same hashValue if their String representations were canonically equivalent, which may or may not be what you desire. 还要注意,使用这种方法,如果ScalarString实例的String表示形式是规范等效的,则它们将具有相同的hashValue ,这可能是您想要的,也可能不是您想要的。

So I suppose that if you want the hashValue to represent a unique String , my approach would be good. 因此,我想如果您希望hashValue表示一个唯一的String ,那么我的方法会很好。 If you want the hashValue to represent a unique sequence of UInt32 values, @Kametrixom's answer is the way to go... 如果您希望hashValue表示UInt32值的唯一序列,则@Kametrixom的答案就是解决方法...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM