简体   繁体   English

Javascript:Unicode 字符串到十六进制

[英]Javascript: Unicode string to hex

I'm trying to convert a unicode string to a hexadecimal representation in javascript.我正在尝试将 unicode 字符串转换为 javascript 中的十六进制表示。

This is what I have:这就是我所拥有的:

function convertFromHex(hex) {
    var hex = hex.toString();//force conversion
    var str = '';
    for (var i = 0; i < hex.length; i += 2)
        str += String.fromCharCode(parseInt(hex.substr(i, 2), 16));
    return str;
}

function convertToHex(str) {
    var hex = '';
    for(var i=0;i<str.length;i++) {
        hex += ''+str.charCodeAt(i).toString(16);
    }
    return hex;
}

But if fails on unicode characters, like chinese;但是如果在unicode字符上失败,比如中文;

Input: 漢字输入:汉字

Output: ªo"[W输出: ªo"[W

Any ideas?有任何想法吗? Can this be done in javascript?这可以在javascript中完成吗?

Remember that a JavaScript code unit is 16 bits wide.请记住,JavaScript 代码单元的宽度为 16 位。 Therefore the hex string form will be 4 digits per code unit.因此,十六进制字符串形式将为每个代码单元 4 位数字。

usage:用法:

var str = "\u6f22\u5b57"; // "\u6f22\u5b57" === "漢字"
alert(str.hexEncode().hexDecode());

String to hex form:字符串到十六进制形式:

String.prototype.hexEncode = function(){
    var hex, i;

    var result = "";
    for (i=0; i<this.length; i++) {
        hex = this.charCodeAt(i).toString(16);
        result += ("000"+hex).slice(-4);
    }

    return result
}

Back again:再次回来:

String.prototype.hexDecode = function(){
    var j;
    var hexes = this.match(/.{1,4}/g) || [];
    var back = "";
    for(j = 0; j<hexes.length; j++) {
        back += String.fromCharCode(parseInt(hexes[j], 16));
    }

    return back;
}

Here is a tweak of McDowell's algorithm that doesn't pad the result:这是对 McDowell 算法的一个不填充结果的调整:

  function toHex(str) {
    var result = '';
    for (var i=0; i<str.length; i++) {
      result += str.charCodeAt(i).toString(16);
    }
    return result;
  }

It depends on what encoding you use.这取决于您使用的编码。 If you want to convert utf-8 encoded hex to string, use this:如果要将 utf-8 编码的十六进制转换为字符串,请使用以下命令:

function fromHex(hex,str){
  try{
    str = decodeURIComponent(hex.replace(/(..)/g,'%$1'))
  }
  catch(e){
    str = hex
    console.log('invalid hex input: ' + hex)
  }
  return str
}

For the other direction use this:对于另一个方向,请使用:

function toHex(str,hex){
  try{
    hex = unescape(encodeURIComponent(str))
    .split('').map(function(v){
      return v.charCodeAt(0).toString(16)
    }).join('')
  }
  catch(e){
    hex = str
    console.log('invalid text input: ' + str)
  }
  return hex
}

A more up to date solution, for encoding:一个更新的解决方案,用于编码:

// This is the same for all of the below, and
// you probably won't need it except for debugging
// in most cases.
function bytesToHex(bytes) {
  return Array.from(
    bytes,
    byte => byte.toString(16).padStart(2, "0")
  ).join("");
}

// You almost certainly want UTF-8, which is
// now natively supported:
function stringToUTF8Bytes(string) {
  return new TextEncoder().encode(string);
}

// But you might want UTF-16 for some reason.
// .charCodeAt(index) will return the underlying
// UTF-16 code-units (not code-points!), so you
// just need to format them in whichever endian order you want.
function stringToUTF16Bytes(string, littleEndian) {
  const bytes = new Uint8Array(string.length * 2);
  // Using DataView is the only way to get a specific
  // endianness.
  const view = new DataView(bytes.buffer);
  for (let i = 0; i != string.length; i++) {
    view.setUint16(i, string.charCodeAt(i), littleEndian);
  }
  return bytes;
}

// And you might want UTF-32 in even weirder cases.
// Fortunately, iterating a string gives the code
// points, which are identical to the UTF-32 encoding,
// though you still have the endianess issue.
function stringToUTF32Bytes(string, littleEndian) {
  const codepoints = Array.from(string, c => c.codePointAt(0));
  const bytes = new Uint8Array(codepoints.length * 4);
  // Using DataView is the only way to get a specific
  // endianness.
  const view = new DataView(bytes.buffer);
  for (let i = 0; i != codepoints.length; i++) {
    view.setUint32(i, codepoints[i], littleEndian);
  }
  return bytes;
}

Examples:例子:

bytesToHex(stringToUTF8Bytes("hello 漢字 👍"))
// "68656c6c6f20e6bca2e5ad9720f09f918d"
bytesToHex(stringToUTF16Bytes("hello 漢字 👍", false))
// "00680065006c006c006f00206f225b570020d83ddc4d"
bytesToHex(stringToUTF16Bytes("hello 漢字 👍", true))
// "680065006c006c006f002000226f575b20003dd84ddc"
bytesToHex(stringToUTF32Bytes("hello 漢字 👍", false))
// "00000068000000650000006c0000006c0000006f0000002000006f2200005b57000000200001f44d"
bytesToHex(stringToUTF32Bytes("hello 漢字 👍", true))
// "68000000650000006c0000006c0000006f00000020000000226f0000575b0000200000004df40100"

For decoding, it's generally a lot simpler, you just need:对于解码,通常要简单得多,您只需要:

function hexToBytes(hex) {
    const bytes = new Uint8Array(hex.length / 2);
    for (let i = 0; i !== bytes.length; i++) {
        bytes[i] = parseInt(hex.substr(i * 2, 2), 16);
    }
    return bytes;
}

then use the encoding parameter of TextDecoder :然后使用TextDecoder的 encoding 参数:

// UTF-8 is default
new TextDecoder().decode(hexToBytes("68656c6c6f20e6bca2e5ad9720f09f918d"));
// but you can also use:
new TextDecoder("UTF-16LE").decode(hexToBytes("680065006c006c006f002000226f575b20003dd84ddc"))
new TextDecoder("UTF-16BE").decode(hexToBytes("00680065006c006c006f00206f225b570020d83ddc4d"));
// "hello 漢字 👍"

Here's the list of allowed encoding names: https://www.w3.org/TR/encoding/#names-and-labels以下是允许的编码名称列表: https : //www.w3.org/TR/encoding/#names-and-labels

You might notice UTF-32 is not on that list, which is a pain, so:您可能会注意到 UTF-32 不在该列表中,这很痛苦,因此:

function bytesToStringUTF32(bytes, littleEndian) {
  const view = new DataView(bytes.buffer);
  const codepoints = new Uint32Array(view.byteLength / 4);
  for (let i = 0; i !== codepoints.length; i++) {
    codepoints[i] = view.getUint32(i * 4, littleEndian);
  }
  return String.fromCodePoint(...codepoints);
}

Then:然后:

bytesToStringUTF32(hexToBytes("00000068000000650000006c0000006c0000006f0000002000006f2200005b57000000200001f44d"), false)
bytesToStringUTF32(hexToBytes("68000000650000006c0000006c0000006f00000020000000226f0000575b0000200000004df40100"), true)
// "hello 漢字 👍"

how do you get "\漢\字" from漢字in JavaScript?你如何从 JavaScript 中的漢字中得到"\漢\字"

These are JavaScript Unicode escape sequences eg \ካ .这些是JavaScript Unicode 转义序列,例如\ካ To convert them, you could iterate over every code unit in the string, call .toString(16) on it, and go from there.要转换它们,您可以遍历字符串中的每个代码单元,对其调用.toString(16) ,然后从那里开始。

However, it is more efficient to also use hexadecimal escape sequences eg \\xAA in the output wherever possible.但是, \\xAA在输出中使用十六进制转义序列(例如\\xAA会更有效。

Also note that ASCII symbols such as A , b , and - probably don't need to be escaped.另请注意,ASCII 符号(例如Ab-可能不需要转义。

I've written a small JavaScript library that does all this for you, called jsesc .我编写了一个小型 JavaScript 库来为您完成所有这些工作,名为jsesc It has lots of options to control the output.它有很多选项来控制输出。

Here's an online demo of the tool in action: http://mothereff.in/js-escapes#1%E6%BC%A2%E5%AD%97这是该工具的在线演示: http : //mothereff.in/js-escapes#1%E6%BC%A2%E5%AD%97


Your question was tagged as utf-8 .您的问题被标记为utf-8 Reading the rest of your question, UTF-8 encoding/decoding didn't seem to be what you wanted here, but in case you ever need it: use utf8.js ( online demo ).阅读您问题的其余部分,UTF-8 编码/解码似乎不是您想要的,但如果您需要它:使用utf8.js在线演示)。

Here you go.干得好。 :D :D

"漢字".split("").reduce((hex,c)=>hex+=c.charCodeAt(0).toString(16).padStart(4,"0"),"")
 "6f225b57"

for non unicode对于非 unicode

"hi".split("").reduce((hex,c)=>hex+=c.charCodeAt(0).toString(16).padStart(2,"0"),"")
 "6869"

ASCII (utf-8) binary HEX string to string ASCII (utf-8) 二进制十六进制字符串到字符串

"68656c6c6f20776f726c6421".match(/.{1,2}/g).reduce((acc,char)=>acc+String.fromCharCode(parseInt(char, 16)),"")

String to ASCII (utf-8) binary HEX string字符串转 ASCII (utf-8) 二进制十六进制字符串

"hello world!".split("").reduce((hex,c)=>hex+=c.charCodeAt(0).toString(16).padStart(2,"0"),"")

--- unicode --- --- 统一码 ---

String to UNICODE (utf-16) binary HEX string字符串到 UNICODE (utf-16) 二进制十六进制字符串

"hello world!".split("").reduce((hex,c)=>hex+=c.charCodeAt(0).toString(16).padStart(4,"0"),"")

UNICODE (utf-16) binary HEX string to string UNICODE (utf-16) 二进制十六进制字符串到字符串

"00680065006c006c006f00200077006f0072006c00640021".match(/.{1,4}/g).reduce((acc,char)=>acc+String.fromCharCode(parseInt(char, 16)),"")

Here is my take: these functions convert a UTF8 string to a proper HEX without the extra zeroes padding.这是我的看法:这些函数将 UTF8 字符串转换为正确的 HEX,而无需额外的零填充。 A real UTF8 string has characters with 1, 2, 3 and 4 bytes length.真正的 UTF8 字符串具有 1、2、3 和 4 个字节长度的字符。

While working on this I found a couple key things that solved my problems:在研究这个时,我发现了一些解决我问题的关键事情:

  1. str.split('') doesn't handle multi-byte characters like emojis correctly. str.split('')不能正确处理像表情符号这样的多字节字符。 The proper/modern way to handle this is with Array.from(str)处理这个问题的正确/现代方法是使用Array.from(str)
  2. encodeURIComponent() and decodeURIComponent() are great tools to convert between string and hex. encodeURIComponent()decodeURIComponent()是在字符串和十六进制之间转换的好工具。 They are pretty standard, they handle UTF8 correctly.它们非常标准,可以正确处理 UTF8。
  3. (Most) ASCII characters (codes 0 - 127) don't get URI encoded, so they need to handled separately. (大多数)ASCII 字符(代码 0 - 127)不会被 URI 编码,因此它们需要单独处理。 But c.charCodeAt(0).toString(16) works perfectly for those但是c.charCodeAt(0).toString(16)非常适合那些
    function utf8ToHex(str) {
      return Array.from(str).map(c => 
        c.charCodeAt(0) < 128 ? c.charCodeAt(0).toString(16) : 
        encodeURIComponent(c).replace(/\%/g,'').toLowerCase()
      ).join('');
    },
    function hexToUtf8: function(hex) {
      return decodeURIComponent('%' + hex.match(/.{1,2}/g).join('%'));
    }

Demo: https://jsfiddle.net/lyquix/k2tjbrvq/演示: https : //jsfiddle.net/lyquix/k2tjbrvq/

UTF-8 Supported Convertion UTF-8 支持的转换

Decode解码

function utf8ToHex(str) {
  return Array.from(str).map(c => 
    c.charCodeAt(0) < 128 ? c.charCodeAt(0).toString(16) : 
    encodeURIComponent(c).replace(/\%/g,'').toLowerCase()
  ).join('');
}

Encode编码

function hexToUtf8(hex) {
  return decodeURIComponent('%' + hex.match(/.{1,2}/g).join('%'));
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM