I'm trying to find any information about the encoding behind L""
strings?
https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-160
I know wchar_t
is undefined because it can be any multi-byte encoding. But what happens if I use an L""
string? Even the docs just leave out the information.
auto s2 = L"hello"; // const wchar_t* <-- it's undefined but why?
auto s3 = u"hello"; // const char16_t*, encoded as UTF-16
auto s4 = U"hello"; // const char32_t*, encoded as UTF-32
wchar_t
is a standard type, but its exact implementation is left to individual compilers. Microsoft decided back when Unicode all fit into 16-bit quantities that wchar_t
would be 2 bytes in size, and Windows would use UCS-2. Later, when Unicode exceeded 16-bit quantities, Windows was updated to use UTF-16, and since Windows operated on little-endian processors, that made it UTF-16LE. wchar_t
remained 2 bytes in size, which can handle UTF-16 values, using surrogate pairs for Unicode codepoints above U+FFFF.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.