简体   繁体   中英

What is the meaning of 'Swift are Unicode correct and locale insensitive' in Swift's String document?

I found this sentence in Swift's String document ( https://developer.apple.com/documentation/swift/string )

Overview

A string is a series of characters, such as "Swift", that forms a collection. Strings in Swift are Unicode correct and locale insensitive, and are designed to be efficient. The String type bridges with the Objective-C class NSString and offers interoperability with C functions that works with strings.

But, I can't understand this one hundred percent and I don't know where to start.

To expand on @matt's answer a little:

The Unicode Consortium maintains certain standards for interoperation of data, and one of the most well-known standards is the Unicode string standard . This standard defines a huge list of characters and their properties, along with rules for how those characters interact with one another. (Like Matt notes: letters, emoji, combining characters [letters with diacritics, like é , etc.)

Swift strings being "Unicode-correct" means that Swift strings conform to this Unicode standard, offering the same characters, rules, and interactions as any other string implementation which conforms to the same standard. These days, being the main standard that many string implementations already conform to, this largely means that Swift strings will "just work" the way that you expect.

However, along with the character definitions, Unicode also defines many rules for how to perform certain common string actions, such as uppercasing and lowercasing strings, or sorting them. These rules can be very specific, and in many cases, depend entirely on context (eg, the locale, or the language and region the text might belong to, or be displayed in). For instance:

  • Case conversion :
    • In English, the uppercase form of i ("LATIN SMALL LETTER I" in Unicode) is I ("LATIN CAPITAL LETTER I"), and vice versa
    • In Turkish, however, the uppercase form of i is actually İ ("LATIN CAPITAL LETTER I WITH DOT ABOVE"), and the lowercase form of I ("LATIN CAPITAL LETTER I") is ı ("LATIN SMALL LETTER DOTLESS I")
  • Collation (sorting) :
    • In English, the letter Å ("LATIN CAPITAL LETTER A WITH RING ABOVE") is largely considered the same as the letter A ("LATIN CAPITAL LETTER A"), just with a modifier on it. Sorted in a list, words starting with Å would appear along with other A words, but before B words
    • In certain Scandinavian languages, however, Å is its own letter, distinct from A . In Danish and Norwegian, Å comes at the end of the alphabet: ... X, Y, Z, Æ, Ø, Å . In Swedish and Finnish, the alphabet ends with: ... X, Y, Z, Å, Ä, Ö . For these languages, words starting with Å would come after Z words in a list

In order to perform many string operations in a way that makes sense to users in various languages, those operations need to be performed within the context of their language and locale.

In the context of the documentation's description, "locale-insensitive" means that Swift strings do not offer locale-specific rules like these, and default to Unicode's default case conversion, case folding, and collation rules (effectively: English). So, in contexts where correct handling of these are needed (eg you are writing a localized app), you'll want to use the Foundation extensions to String methods which do take a Locale for correct handling:

among others.

It basically just means that Swift strings are Unicode strings. A Swift string "character" is a character in a Unicode sense: a letter, an emoji, a combined letter-and-diacritic, whatever. A string can also be viewed not merely as a character sequence but as a sequence of UTF8, 16, or 32 code points. The "locale insensitive" stuff means they don't have a locale dependent encoding, as strings did in the bad old days before Unicode.

This is delightful but it has some downsides, most notably that strings qua character-sequence are not directly indexable by integers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM