[英]node js base64 string into utf8 issue
I have a string which is base64 and I need to convert it into utf-8.我有一个字符串是 base64,我需要将其转换为 utf-8。
base64_string "VABpAG0AZQAgAHMAZQByAGUAaQBzAA=="
I am trying to convert base64_string into utf-8 in the following env:我正在尝试在以下环境中将 base64_string 转换为 utf-8:
In browser在浏览器中
method : atob(base64_string)
`Result = "Time series",`
which is correct.哪个是对的。 We can verify the same in
https://www.base64decode.org
我们可以在
https://www.base64decode.org
中验证相同的
In NodeJs I am converting with npm package "atob"在 NodeJs 中,我使用 npm package "atob" 进行转换
method : atob(base64_string)
Result = "T i m e s e r i e s".
For some reasons, I am getting spaces between each character and I don't know why?由于某些原因,我在每个字符之间都有空格,我不知道为什么? I have tried to trim, but that is also not working.
我试图修剪,但这也不起作用。
TL;DR; TL;博士;
Your string is actually UTF-16, not UTF-8.您的字符串实际上是 UTF-16,而不是 UTF-8。 Here's how to decode it properly.
这是正确解码的方法。
function atob(b64txt) {
const buff = Buffer.from(b64txt, 'base64');
const txt = buff.toString('utf16le');
return txt;
}
Explanation: Your base64 encoded string isn't actually UTF-8 or ASCII data.说明:您的 base64 编码字符串实际上不是 UTF-8 或 ASCII 数据。 It's UTF-16 (little-endian).
它是 UTF-16(小端序)。 That means every character always has two bytes.
这意味着每个字符总是有两个字节。
UTF-8 is different: any byte that is less than 127 indicates a single-byte character. UTF-8 不同:任何小于 127 的字节都表示单字节字符。 A byte greater than 127 would have a second byte, and if the second byte is > 127 there would be a third byte, etc.
大于 127 的字节会有第二个字节,如果第二个字节 > 127 会有第三个字节,以此类推。
So let's decode your string to character codes and see what it looks like:因此,让我们将您的字符串解码为字符代码,看看它是什么样子:
const b64txt = 'VABpAG0AZQAgAHMAZQByAGUAaQBzAA==';
const buff = Buffer.from(b64txt, 'base64');
console.log(JSON.stringify(buff));
// >> {"type":"Buffer","data":[84,0,105,0,109,0,101,0,32,0,115,0,101,0,114,0,101,0,105,0,115,0]}
First character (84) is the ASCII character for T
.第一个字符 (84) 是
T
的 ASCII 字符。 But it's less than 127, and it still has a 0
byte following it.但它小于 127,而且它后面还有一个
0
字节。 So...not UTF-8.所以...不是 UTF-8。
That's the clue that this string has two bytes per character, making it UTF-16.这就是这个字符串每个字符有两个字节的线索,使其成为 UTF-16。 And the fact that the 0 follows the character is the clue that it's "little-endian" (the 0-255 byte comes first, and the 256-65536 byte comes second).
字符后面的 0 表明它是“小端序”(0-255 字节排在第一位,256-65536 字节排在第二位)。
If you want to change this buffer into text, you need to interpret it as the correct type of string:如果要将此缓冲区更改为文本,则需要将其解释为正确的字符串类型:
const txt = buff.toString('utf16le'); // <- UTF-16, little-endian
console.log(txt);
// >> "Time sereis"
So in node.js, if you combine those two commands, you end up with a full fledged solution to get your string decoded properly, as above in the TL;DR;.所以在 node.js 中,如果你结合这两个命令,你最终会得到一个完整的解决方案来正确解码你的字符串,如 TL;DR; 中所述。
Of course if your encoding type changes, you'd have to change this as well, and do toString('utf8')
or whatever the appropriate encoding is.当然,如果您的编码类型发生更改,您也必须更改它,并执行
toString('utf8')
或任何适当的编码。
(credit: I referenced this and this as I was drafting this answer.) (信用:我在起草这个答案时引用了这个和这个。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.