为什么 JavaScript 的字符串使用 UTF-16 而一个字符的实际大小可能只有一个字节？

Question

according to this article :根据这篇文章：

Internally, JavaScript source code is treated as a sequence of UTF-16 code units.在内部，JavaScript 源代码被视为 UTF-16 代码单元序列。

And this IBM doc says that:这个IBM 文档说：

UTF-16 is based on 16-bit code units. UTF-16 基于 16 位代码单元。 Therefore, each character can be 16 bits (2 bytes) or 32 bits (4 bytes).因此，每个字符可以是 16 位（2 个字节）或 32 位（4 个字节）。

But I tested in Chrome's console that English letters are only taking 1 byte, not 2 or 4.但我在 Chrome 的控制台中测试，英文字母只占用 1 个字节，而不是 2 或 4 个字节。

new Blob(['a']).size === 1

I wonder why that is the case?我想知道为什么会这样？ Am I missing something here?我在这里错过了什么吗？

Answer 1

Internally, JavaScript source code is treated as a sequence of UTF-16 code units.在内部，JavaScript 源代码被视为 UTF-16 代码单元序列。

Note that this is referring to source code, not String values.请注意，这是指源代码，而不是字符串值。 String values are referenced to also be UTF-16 later in the article:字符串值在文章后面也被引用为 UTF-16：

When a String contains actual textual data, each element is considered to be a single UTF-16 code unit.当一个字符串包含实际的文本数据时，每个元素都被认为是一个 UTF-16 代码单元。

The discrepancy here is actually in the Blob constructor.这里的差异实际上是在 Blob 构造函数中。 From MDN :来自MDN ：

Note that strings here are encoded as UTF-8, unlike the usual JavaScript UTF-16 strings.请注意，这里的字符串编码为 UTF-8，与通常的 JavaScript UTF-16 字符串不同。

Answer 2

UTF has a varying character size. UTF 具有不同的字符大小。

a has a size of 1 byte, but ą for example has 2 a的大小为 1 字节，但ą例如有 2

 console.log('a', new Blob(['a']).size) console.log('ą', new Blob(['ą']).size)

为什么 JavaScript 的字符串使用 UTF-16 而一个字符的实际大小可能只有一个字节？

问题描述

2 个解决方案

解决方案1
5 2022-07-03 18:10:06

解决方案2
0 2022-07-03 18:03:53

为什么 JavaScript 的字符串使用 UTF-16 而一个字符的实际大小可能只有一个字节？

问题描述

2 个解决方案

解决方案1 5 2022-07-03 18:10:06

解决方案2 0 2022-07-03 18:03:53

解决方案1
5 2022-07-03 18:10:06

解决方案2
0 2022-07-03 18:03:53