节点 js base64 字符串转换为 utf8 问题

Question

I have a string which is base64 and I need to convert it into utf-8.我有一个字符串是 base64，我需要将其转换为 utf-8。

base64_string "VABpAG0AZQAgAHMAZQByAGUAaQBzAA=="

I am trying to convert base64_string into utf-8 in the following env:我正在尝试在以下环境中将 base64_string 转换为 utf-8：

In browser在浏览器中

method : atob(base64_string)

`Result = "Time series",`

which is correct.哪个是对的。 We can verify the same in https://www.base64decode.org我们可以在https://www.base64decode.org中验证相同的

In NodeJs I am converting with npm package "atob"在 NodeJs 中，我使用 npm package "atob" 进行转换

method : atob(base64_string)

Result = "T i m e  s e r i e s".

For some reasons, I am getting spaces between each character and I don't know why?由于某些原因，我在每个字符之间都有空格，我不知道为什么？ I have tried to trim, but that is also not working.我试图修剪，但这也不起作用。

Answer 1

TL;DR; TL;博士;

Your string is actually UTF-16, not UTF-8.您的字符串实际上是 UTF-16，而不是 UTF-8。 Here's how to decode it properly.这是正确解码的方法。

function atob(b64txt) {
  const buff = Buffer.from(b64txt, 'base64');
  const txt = buff.toString('utf16le');
  return txt;
}

Explanation: Your base64 encoded string isn't actually UTF-8 or ASCII data.说明：您的 base64 编码字符串实际上不是 UTF-8 或 ASCII 数据。 It's UTF-16 (little-endian).它是 UTF-16（小端序）。 That means every character always has two bytes.这意味着每个字符总是有两个字节。

UTF-8 is different: any byte that is less than 127 indicates a single-byte character. UTF-8 不同：任何小于 127 的字节都表示单字节字符。 A byte greater than 127 would have a second byte, and if the second byte is > 127 there would be a third byte, etc.大于 127 的字节会有第二个字节，如果第二个字节 > 127 会有第三个字节，以此类推。

So let's decode your string to character codes and see what it looks like:因此，让我们将您的字符串解码为字符代码，看看它是什么样子：

const b64txt = 'VABpAG0AZQAgAHMAZQByAGUAaQBzAA==';
const buff = Buffer.from(b64txt, 'base64');
console.log(JSON.stringify(buff));
// >> {"type":"Buffer","data":[84,0,105,0,109,0,101,0,32,0,115,0,101,0,114,0,101,0,105,0,115,0]}

First character (84) is the ASCII character for T .第一个字符 (84) 是T的 ASCII 字符。 But it's less than 127, and it still has a 0 byte following it.但它小于 127，而且它后面还有一个0字节。 So...not UTF-8.所以...不是 UTF-8。

That's the clue that this string has two bytes per character, making it UTF-16.这就是这个字符串每个字符有两个字节的线索，使其成为 UTF-16。 And the fact that the 0 follows the character is the clue that it's "little-endian" (the 0-255 byte comes first, and the 256-65536 byte comes second).字符后面的 0 表明它是“小端序”（0-255 字节排在第一位，256-65536 字节排在第二位）。

If you want to change this buffer into text, you need to interpret it as the correct type of string:如果要将此缓冲区更改为文本，则需要将其解释为正确的字符串类型：

const txt = buff.toString('utf16le'); // <- UTF-16, little-endian
console.log(txt);
// >> "Time sereis"

So in node.js, if you combine those two commands, you end up with a full fledged solution to get your string decoded properly, as above in the TL;DR;.所以在 node.js 中，如果你结合这两个命令，你最终会得到一个完整的解决方案来正确解码你的字符串，如 TL;DR; 中所述。

Of course if your encoding type changes, you'd have to change this as well, and do toString('utf8') or whatever the appropriate encoding is.当然，如果您的编码类型发生更改，您也必须更改它，并执行toString('utf8')或任何适当的编码。

(credit: I referenced this and this as I was drafting this answer.) （信用：我在起草这个答案时引用了这个和这个。）

节点 js base64 字符串转换为 utf8 问题

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-26 16:37:43

节点 js base64 字符串转换为 utf8 问题

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-26 16:37:43

解决方案1
1 已采纳 2020-05-26 16:37:43