简体   繁体   中英

How does `JSONDecoder` know which encoding to use?

Having read Joel on Encoding like a good boy, I find myself perplexed by the workings of Foundation's JSONDecoder , neither of whose init or decode methods take an encoding value. Looking through the docs, I see the instance variable dataDecodingStrategy , which perhaps this is where the encoding-guessing magic happens...?

Am I missing something here? Shouldn't JSONDecoder need to know the encoding of the data it receives? I realize that the JSON standard requires this data to be UTF-8 encoded, but can JSONDecoder be making that assumption? I'm confused.

RFC 8259 (from 2017) requires that

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8.

The older RFC 7159 (from 2013) and RFC 7158 (from 2013) only stated that

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32).

And RFC 4627 (from 2006, the oldest one that I could find):

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

Since the first two characters of a JSON text will always be ASCII characters, it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

JSONDecoder (which uses JSONSerialization under the hood) is able to decode UTF-8, UTF-16, and UTF-32, both little-endian and big-endian. Example:

let data = "[1, 2, 3]".data(using: .utf16LittleEndian)!
print(data as NSData) // <5b003100 2c002000 32002c00 20003300 5d00>

let a = try! JSONDecoder().decode([Int].self, from: data)
print(a) // [1, 2, 3]

Since a valid JSON text must start with "[", or "{", the encoding can unambiguously be determined from the first bytes of the data.

I did not find this documented though, and one probably should not rely on it. A future implementation of JSONDecoder might support only the newer standard and require UTF-8.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM