Windows使用什么unicode编码（UTF-8，UTF-16，其他）作为其Unicode数据类型？

Question

There are different encodings of the same Unicode (standardized) table . 相同的Unicode（标准化）表有不同的编码。 For example for UTF-8 encoding A corresponds to 0x0041 but for UTF-16 encoding the same A is represented as 0xfeff0041 . 例如，对于UTF-8编码， A 对应于 0x0041但对于UTF-16编码，相同的A 表示为 0xfeff0041 。

From this brilliant article I have learned that when I program by C++ for Windows platform and I deal with Unicode that I should know that it is represented in 2 bytes. 从这篇精彩的文章中我了解到，当我使用C ++ for Windows平台编程并处理Unicode时，我应该知道它以2个字节表示。 But it does not say anything about the encoding. 但它没有说明编码。 (Even it says that x86 CPUs are little-endian so I know how those two bytes are stored in memory.) But I should also know the encoding of the Unicode so that I have a complete information about how the symbols are stored in memory. （即使它说x86 CPU是little-endian所以我知道这两个字节是如何存储在内存中的。）但是我也应该知道Unicode的编码，这样我就可以获得有关符号如何存储在内存中的完整信息。 Is there any fixed Unicode encoding for C++/Windows programmers? C ++ / Windows程序员有没有固定的Unicode编码？

Answer 1

The values stored in memory for Windows are UTF-16 little-endian, always. 存储在Windows内存中的值始终为UTF-16 little-endian。 But that's not what you're talking about - you're looking at file contents. 但那不是你所说的 - 你在看文件内容。 Windows itself does not specify the encoding of files, it leaves that to individual applications. Windows本身不指定文件的编码，而是将其留给单个应用程序。

The 0xfe 0xff you see at the start of the file is a Byte Order Mark or BOM . 您在文件开头看到的0xfe 0xff是字节顺序标记或BOM 。 It not only indicates that the file is most probably Unicode, but it tells you which variant of Unicode encoding. 它不仅表明该文件很可能是Unicode，而且它告诉您Unicode编码的哪种变体。

0xfe 0xff      UTF-16 big-endian
0xff 0xfe      UTF-16 little-endian
0xef 0xbb 0xbf UTF-8

A file that doesn't have a BOM should be assumed to be 8-bit characters unless you know how it was written. 除非您知道如何编写，否则应将具有BOM的文件假定为8位字符。 That still doesn't tell you if it's UTF-8 or some other Windows character encoding, you'll just have to guess. 这仍然没有告诉你，如果它是UTF-8或其他一些Windows字符编码，你只需要猜测。

You may use Notepad as an example of how this is done. 您可以使用记事本作为如何完成此操作的示例。 If the file has a BOM then Notepad will read it and process the contents appropriately. 如果文件有BOM，那么记事本将读取它并适当地处理内容。 Otherwise you must specify the coding yourself with the "Encoding" dropdown list. 否则，您必须使用“编码”下拉列表自行指定编码。

Edit: the reason Windows documentation isn't more specific about the encoding is that Windows was a very early adopter of Unicode, and at the time there was only one encoding of 16 bits per code point . 编辑：Windows文档没有更具体的编码原因是Windows是Unicode的早期采用者，当时每个代码点只有一个 16位编码。 When 65536 code points were determined to be inadequate, surrogate pairs were invented as a way to extend the range and UTF-16 was born. 当确定65536个代码点不合适时，代理对被发明为扩展范围的方式，并且UTF-16诞生了。 Microsoft was already using Unicode to refer to their encoding and never changed. 微软已经使用Unicode来引用他们的编码而且从未改变过。

Windows使用什么unicode编码（UTF-8，UTF-16，其他）作为其Unicode数据类型？

问题描述

1 个解决方案

解决方案1
15 已采纳 2012-11-21 18:54:23

Windows使用什么unicode编码（UTF-8，UTF-16，其他）作为其Unicode数据类型？

问题描述

1 个解决方案

解决方案1 15 已采纳 2012-11-21 18:54:23

解决方案1
15 已采纳 2012-11-21 18:54:23