简体   繁体   English

UTF-8 是编码还是文档字符集?

[英]UTF-8 is an Encoding or a Document Character Set?

According with W3C Recommendation says that every aplicattion requires its document character set (Not be confused with Character Encoding).根据W3C 推荐,每个应用程序都需要其文档字符集(不要与字符编码混淆)。

A document character set consists of:一个文档字符集包括:

  • A Repertoire: A set of abstract characters, such as the Latin letter "A", the Cyrillic letter "I", the Chinese character meaning "water", etc. A Repertoire:一组抽象字符,如拉丁字母“A”、西里尔字母“I”、汉字“水”等。

  • Code positions: A set of integer references to characters in the repertoire.代码位置:一组 integer 参考曲目中的字符。

Each document is a sequence of characters from the repertoire.每个文档都是来自曲目的字符序列。

Character Encoding is: How those characters may be represented字符编码是:如何表示这些字符

When i save a file in Windows notepad im guessing that this are the "Document Character Sets":当我在 Windows 记事本中保存文件时,我猜测这是“文档字符集”:

  • ANSI ANSI
  • UNICODE UNICODE
  • UNICODE BIG ENDIAN UNICODE 大端
  • UTF-8 UTF-8

Simple 3 questions:简单的3个问题:

I want to know if those are the "document character sets".我想知道这些是否是“文档字符集”。 And if they are,如果是的话,

  1. Why is UTF-8 on the list?为什么 UTF-8 上榜? UTF-8 is not supposed to be an encoding ? UTF-8 不应该是编码吗?

    If im not wrong with all this stuff:如果我对所有这些东西都没有错:

  2. Are there another Document Character Sets that Windows do not allow you to define?是否还有其他 Windows 不允许您定义的文档字符集?

  3. How to define another document character sets?如何定义另一个文档字符集?

In my understanding:据我了解:

  • ANSI is both a character set and an encoding of that character set. ANSI 既是字符集又是该字符集的编码。
  • Unicode is a character set; Unicode 是一个字符集; the the encoding in question is probably UTF-16.有问题的编码可能是 UTF-16。 An alternative encoding of the same character set is big-endian UTF-16, which is probably what the third option is referring to.相同字符集的另一种编码是 big-endian UTF-16,这可能是第三个选项所指的。
  • UTF-8 is an encoding of Unicode. UTF-8 是 Unicode 的编码。

The purpose of that dropdown in the Save dialog is really to select both a character set and an encoding for it, but they've been a little careless with the naming of the options.保存对话框中该下拉菜单的目的实际上是 select 的字符集和编码,但他们对选项的命名有点粗心。

(Technically, though, an encoding just maps integers to byte sequences, so any encoding could be used with any character set that is small enough to "fit" the encoding. However, the UTF-* encodings are designed with Unicode in mind.) (但从技术上讲,编码只是将整数映射到字节序列,因此任何编码都可以与任何小到足以“适合”编码的字符集一起使用。但是,UTF-* 编码在设计时考虑了 Unicode。)

Also, see Joel on Software's mandatory article on the subject .另外,请参阅Joel 关于软件的关于该主题的必读文章

UTF-8 is a character encoding that is also used to specify a character set for HTML and other textual documents. UTF-8是一种字符编码,也用于为 HTML 和其他文本文档指定字符集。 It is one of several Unicode encodings (UTF-16 is another).它是几种 Unicode 编码之一(UTF-16 是另一种)。

To answer your questions:要回答您的问题:

  • It is on the list because Microsoft decided to implement it in notepad.它在列表中是因为微软决定在记事本中实现它。
  • There are many other character sets, though defining your own is not useful, so not really possible.还有许多其他字符集,虽然定义自己的字符集没有用,所以不太可能。
  • You can't define other character sets to save with notepad.您不能定义其他字符集以使用记事本保存。 Try using a programmers editor such as notepad++ that will give you more character sets to use.尝试使用诸如 notepad++ 之类的程序员编辑器,它可以为您提供更多字符集以供使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM