简体   繁体   中英

Why does String.fromCharCode(0xd800) to String.fromCharCode(0xdfff) return the replacement character?

Why does this happen:

> String.fromCharCode(0xd7FF)
'퟿'
> String.fromCharCode(0xd800)
'�'
> String.fromCharCode(0xdffe) // (and everything in between)
'�'
> String.fromCharCode(0xdfff)
'�'
> String.fromCharCode(0xe000)
''

DFFF₁₆ is 55296₁₀. I get the same results with String.fromCodePoint() .

Code points U+D800 to U+DFFF are reserved for the UTF-16 encoding of surrogates . Effectively, these are characters which are never valid individually - they always come in surrogate pairs - a high surrogate followed by a low surrogate. (Confusingly, the "high surrogate" range is the range U+D800 to U+DBFF, and the "low surrogate" range is the range U+DC00 to U+DFFF.)

This pair of characters is combined in UTF-16 to represent a single character outside the Basic Multilingual Plane.

Outside this special meaning in UTF-16, these aren't valid characters. So it's reasonable for String.fromCharCode to basically say "you haven't provided valid string data" and use the Unicode replacement character instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM