NodeJS UTF8编码一个缓冲区，然后解码该UTF8字符串产生一个具有不同内容的缓冲区

Question

I typed this into the nodejs console 我在nodejs控制台中输入了

new Buffer(new Buffer([0xde]).toString('utf8'), 'utf8')

and it prints out 它打印出来

<Buffer ef bf bd>

After looking at the docs it seems that this would produce an identical buffer. 看完文档后，似乎会产生相同的缓冲区。 I'm creating a utf8 encoded string from a buffer whose contents consist of one byte (0xde) then using that utf8 encoded string to create a buffer. 我正在从一个缓冲区中创建一个utf8编码的字符串，该缓冲区的内容由一个字节（0xde）组成，然后使用该utf8编码的字符串创建一个缓冲区。 Am I missing something here? 我在这里想念什么吗？

Answer 1

For encodings that can be multi-byte, you cannot expect to get the same bytes back that you started with in all cases. 对于可以是多字节的编码，您不能期望在所有情况下都能获得与开始时相同的字节。 In the case of UTF-8 , some characters require more than one byte to be represented properly. 对于UTF-8 ，某些字符需要多个字节才能正确表示。

In your example, 0xde exceeds 0x7f which is the largest value for a character that can be represented by a single byte. 在您的示例中， 0xde超过了0x7f ，这是可以由单个字节表示的字符的最大值。 So when you then call .toString('utf8') , node sees that it only has one byte and instead returns the UTF-8 character \� ( 0xef, 0xbf, 0xbd in hex) which is used to denote an unknown/unrepresentable character. 因此，当您随后调用.toString('utf8') ，节点会看到它只有一个字节，而是返回UTF-8字符\� （十六进制的0xef, 0xbf, 0xbd ），用于表示未知/无法表示的字符。 Then reading back in this "replacement character" value back into a new Buffer is no problem, as it is a valid UTF-8 character. 然后，将这个“替换字符”值读回到新的Buffer中就没问题，因为它是有效的UTF-8字符。

NodeJS UTF8编码一个缓冲区，然后解码该UTF8字符串产生一个具有不同内容的缓冲区

问题描述

1 个解决方案

解决方案1
1 2015-02-11 18:50:31

NodeJS UTF8编码一个缓冲区，然后解码该UTF8字符串产生一个具有不同内容的缓冲区

问题描述

1 个解决方案

解决方案1 1 2015-02-11 18:50:31

解决方案1
1 2015-02-11 18:50:31