简体   繁体   English

如何在javascript中将混合的ascii和unicode转换为字符串?

[英]How to convert mixed ascii and unicode to a string in javascript?

I have a mixed source of unicode and ascii characters, for example: 我有unicode和ascii字符的混合来源,例如:

var source = "\u5c07\u63a2\u8a0e HTML5 \u53ca\u5176\u4ed6";

How do I convert it to a string by leveraging and extending the below uniCodeToString function written by myself in Javascript? 如何利用和扩展自己用Javascript编写的以下uniCodeToString函数,将其转换为字符串? This function can convert pure unicode to string. 此函数可以将纯unicode转换为字符串。

function uniCodeToString(source){
    //for example, source = "\u5c07\u63a2\u8a0e"
    var escapedSource = escape(source);
    var codeArray = escapedSource.split("%u");
    var str = "";
    for(var i=1; i<codeArray.length; i++){
        str += String.fromCharCode("0x"+codeArray[i]);
    }
    return str;
}

Use encodeURIComponent, escape was never meant for unicode. 使用encodeURIComponent,转义绝不是Unicode。

   var source = "\u5c07\u63a2\u8a0e HTML5 \u53ca\u5176\u4ed6";


    var enc=encodeURIComponent(source)

   //returned value: (String)
    %E5%B0%87%E6%8E%A2%E8%A8%8E%20HTML5%20%E5%8F%8A%E5%85%B6%E4%BB%96

    decodeURIComponent(enc)

    //returned value: (String)
    將探討 HTML5 及其他

I think you are misunderstanding the purpose of Unicode escape sequences. 我认为您误解了Unicode转义序列的目的。

var source = "\u5c07\u63a2\u8a0e HTML5 \u53ca\u5176\u4ed6";

JavaScript strings are always Unicode (each code unit is a 16 bit UTF-16 encoded value.) The purpose of the escapes is to allow you to describe values that are unsupported by the encoding used to save the source file (eg the HTML page or .JS file is encoded as ISO-8859-1) or to overcome things like keyboard limitations. JavaScript字符串始终是Unicode(每个代码单元是16位UTF-16编码值。)转义的目的是使您能够描述保存源文件所用的编码不支持的值(例如HTML页面或.JS文件被编码为ISO-8859-1)或克服了诸如键盘限制之类的问题。 This is no different to using \\n to indicate a linefeed code point. 这与使用\\n表示换行代码点没有什么不同。

The above string ("將探討 HTML5 及其他") is made up of the values 5c07 63a2 8a0e 0020 0048 0054 004d 004c 0035 0020 53ca 5176 4ed6 whether you write the sequence as a literal or in escape sequences. 上面的字符串(“将探讨HTML5及其他”)由值5c07 63a2 8a0e 0020 0048 0054 004d 004c 0035 0020 53ca 5176 4ed6无论您将序列写为原义5c07 63a2 8a0e 0020 0048 0054 004d 004c 0035 0020 53ca 5176 4ed6还是转义序列。

See the String Literals section of ECMA-262 for more details. 有关更多详细信息,请参见ECMA-262字符串文字部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM