将UTF-8数据转换为正确的字符串格式

Question

If I receive a UTF-8 string via a socket (or for that matter via any external source) I would like to get it as a properly parsed string object. 如果我通过套接字接收到UTF-8字符串（或者通过任何外部来源接收），我希望将其作为正确解析的字符串对象来获取。 The following code shows what I mean 以下代码显示了我的意思

var str='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';

// Find CRLF
var i=str.indexOf('\r\n');

// Parse size up until CRLF
var x=parseInt(str.slice(0, i));

// Read size bytes
var s=str.substr(i+2, x)

console.log(s);

This code should print 此代码应打印

Just a demo string äè 只是一个演示字符串

but as the UTF-8 data is not properly parsed it only parses it up to the first Unicode character 但是由于未正确解析UTF-8数据，因此只能将其解析到第一个Unicode字符

Just a demo string Ã¤ 只是一个演示字符串

Would anyone have an idea how to convert this properly? 会有人知道如何正确转换吗？

Answer 1

It seems you could use this decodeURIComponent(escape(str)) : 看来您可以使用以下decodeURIComponent(escape(str)) ：

var badstr='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';

var str=decodeURIComponent(escape(badstr));

// Find CRLF
var i=str.indexOf('\r\n');

// Parse size up until CRLF
var x=parseInt(str.slice(0, i));

// Read size bytes
var s=str.substr(i+2, x)

console.log(s);

BTW, this kind of issue occurs when you mix UTF-8 and other types of enconding. 顺便说一句，当您混合使用UTF-8和其他类型的编码时，会发生这种问题。 You should check that as well. 您也应该检查一下。

Answer 2

You should use utf8.js which is available on npm . 您应该使用utf8.js ，该文件在npm上可用。

var utf8 = require('utf8');
var encoded = '21\r\nJust a demo string \xC3\xA4\xC3\xA8-foo bar baz';
var decoded = utf8.decode(encoded);
console.log(decoded);

将UTF-8数据转换为正确的字符串格式

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-07-17 17:44:28

解决方案2
0 2014-07-19 13:04:34

将UTF-8数据转换为正确的字符串格式

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-07-17 17:44:28

解决方案2 0 2014-07-19 13:04:34

解决方案1
1 已采纳 2014-07-17 17:44:28

解决方案2
0 2014-07-19 13:04:34