简体   繁体   中英

What is the encoding behind L"" in Windows?

I'm trying to find any information about the encoding behind L"" strings?

https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-160

I know wchar_t is undefined because it can be any multi-byte encoding. But what happens if I use an L"" string? Even the docs just leave out the information.

auto s2 =  L"hello"; // const wchar_t* <-- it's undefined but why?
auto s3 =  u"hello"; // const char16_t*, encoded as UTF-16
auto s4 =  U"hello"; // const char32_t*, encoded as UTF-32

wchar_t is a standard type, but its exact implementation is left to individual compilers. Microsoft decided back when Unicode all fit into 16-bit quantities that wchar_t would be 2 bytes in size, and Windows would use UCS-2. Later, when Unicode exceeded 16-bit quantities, Windows was updated to use UTF-16, and since Windows operated on little-endian processors, that made it UTF-16LE. wchar_t remained 2 bytes in size, which can handle UTF-16 values, using surrogate pairs for Unicode codepoints above U+FFFF.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM