简体   繁体   English

如何比较 swift 中包含 unicode 字符的字符串相等性?

[英]How to compare string equality which contains unicode characters in swift?

In my application I am trying to compare the values coming in from a repo which has number of JSON files, each JSON file will have values for some country as a dictionary for example:在我的应用程序中,我试图比较来自具有多个 JSON 文件的存储库的值,每个 JSON 文件将具有某些国家/地区的值作为字典,例如:

{cz: "Doplňky k Apple TV"
 dk: "Apple TVtilbehør" }  //string1 == "Doplňky k Apple TV"

Similary I have a local plist which also will have dict for same countries for example:同样,我有一个本地 plist,它也将具有相同国家/地区的 dict,例如:

{cz: "Doplňky*k*Apple*TV"
 dk: "*Apple*TV*Tilbehør*" } //string2 == "Doplňky*k*Apple*TV"

So, basically I need to compare each country values and then show only differences to the user.所以,基本上我需要比较每个国家的值,然后只向用户显示差异。

So, in this case cz value in JSON file(string1) and local plist(string2) are same except for the fact that string2 has asterisk in it.因此,在这种情况下,JSON 文件(string1)和本地 plist(string2)中的cz值是相同的,除了 string2 中有星号。 When I just remove asterisk and compare the strings, they still don't match since Doplňky k Apple TV has invisible unicode space after Apple in string1 which looks like a whitespace.当我删除星号并比较字符串时,它们仍然不匹配,因为Doplňky k Apple TVstring1 1 中的Apple之后有不可见的 unicode 空间,看起来像一个空格。

Below is my code to implement the logic:下面是我实现逻辑的代码:

if string2.replaceString(["*", "\u{00a0}"], " ").trimmingCharacters(in: .whitespaces) == string1.replacingOccurrences(of: "\u{00a0}", with: " "){
  //Do something
}

Doplňky k Apple TV string looks like that it comes from the Apple website. Doplňky k Apple TV字符串看起来像是来自 Apple 网站。 When I checked it on their website, this string, it contains NO-BREAK SPACE (U+00A0) between Apple & TV .当我在他们的网站上查看时,这个字符串包含Apple & TV之间的 NO-BREAK SPACE (U+00A0) 。 It's a white space character, but it doesn't equal to a normal SPACE (U+0020).它是一个空白字符,但它不等于正常的SPACE (U+0020)。

"Doplňky k Apple\u{00a0}TV" == "Doplňky k Apple TV" // false

First thing to specify - does it matter?首先要指定 - 重要吗? Should we treat it as equal or not?我们应该平等对待还是不平等对待?

Then you have Apple TVtilbehør & *Apple*TV*Tilbehør* strings.然后你有Apple TVtilbehør*Apple*TV*Tilbehør*字符串。 Is it intentional typo?是故意错字吗? Or Apple TVtilbehør should be Apple TV Tilbehør ?或者Apple TVtilbehør应该是Apple TV Tilbehør吗? Let's assume it's intentional typo to test your comparison.让我们假设测试您的比较是故意的错字。

Next, these * (at the beginning/end) in the *Apple*TV*Tilbehør* string are...?接下来, *Apple*TV*Tilbehør*字符串中的这些* (在开头/结尾)是...? Second thing to specify - should we ignore them?第二件事要说明——我们应该忽略它们吗? Do they represent a whitespace?它们代表空白吗?

Next thing is the Unicode equivalence .接下来是Unicode 等效项 How would you like to compare these two strings?您想如何比较这两个字符串? Swift helps you here (source ): Swift 在这里为您提供帮助(来源):

Comparing strings for equality using the equal-to operator ( == ) or a relational operator (like < or >= ) is always performed using Unicode canonical representation.始终使用 Unicode 规范表示来使用等于运算符 ( == ) 或关系运算符 (如<>= ) 比较字符串是否相等。 As a result, different representations of a string compare as being equal.结果,字符串的不同表示比较相等。

"Cafe\u{301}" == "Café" // true

What about other countries?其他国家呢? Like Germany where Straße equals to Strasse ?像德国一样Straße等于Strasse吗? Third thing to specify - how we should treat these strings?第三件事要指定 - 我们应该如何处理这些字符串?

As you can see, there's a lot of things one should think about.如您所见,有很多事情需要考虑。 Do you have a specification?你有规格吗? Follow it.跟着它。 No specification?没有规范? Your algorithm will stop working sooner or later.你的算法迟早会停止工作。

Playground操场

I took the liberty to specify all these things by myself:我冒昧地自行指定所有这些内容:

  • All whitespaces do equal所有空格都相等
  • * at the beginning/end doesn't matter (ignored) *开头/结尾无关紧要(忽略)
  • Straße does not equal to Strasse Straße不等于Strasse

Sample code:示例代码:

import Foundation

let json = [
    // U+00A0 is NO-BREAK SPACE which looks like a normal space (U+0020)
    "cz": "Doplňky k Apple\u{00a0}TV",
    "dk": "Apple TV Tilbehør",
    "en": "Hello",
    "de": "Straße",
    "fr": "Expos\u{00E9}" // Exposé
]

let plist = [
    "cz": "Doplňky*k*Apple*TV",
    "dk": "*Apple*TV*Tilbehør*",
    "es": "Hola",
    "de": "Strasse",
    "fr": "Expose\u{0301}" // Exposé
]

let jsonKeys = Set(json.keys)
let plistKeys = Set(plist.keys)
let commonKeys = jsonKeys.intersection(plistKeys)
let keysMissingInJson = plistKeys.subtracting(jsonKeys)
let keysMissingInPlist = jsonKeys.subtracting(plistKeys)

print("Languages missing in JSON: \(keysMissingInJson.count)")
keysMissingInJson.forEach { key in
    print(" - \(key)")
}

print("Languages missing in PLIST: \(keysMissingInPlist.count)")
keysMissingInPlist.forEach { key in
    print(" - \(key)")
}

let differentValueKeys: [String] = commonKeys.compactMap { key in
    guard let initialJsonValue = json[key], let initialPlistValue = plist[key] else {
        fatalError("Fix commonKeys")
    }
    
    // Replace all whitespace characters with a normal space
    let jsonValue = String(
        initialJsonValue.map { $0.isWhitespace ? " " : $0 }
    )
    
    let plistValue = initialPlistValue
        // Replace all * with a normal whitespace
        .replacingOccurrences(of: "*", with: " ")
        // Trim all whitespace characters from the beginning/end
        .trimmingCharacters(in: .whitespaces)
    
    return jsonValue == plistValue ? nil : key
}

print("Different values: \(differentValueKeys.count)")
differentValueKeys.forEach { key in
    print(" - \(key): JSON(\(json[key]!)) PLIST(\(plist[key]!))")
}

Output: Output:

Languages missing in JSON: 1
 - es
Languages missing in PLIST: 1
 - en
Different values: 1
 - de: JSON(Straße) PLIST(Strasse)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM