繁体   English   中英

JavaScript:如何将多字节字符串数组转换为 32 位 int 数组?

[英]JavaScript: How to convert multi-byte string array to 32bits int array?

我有一个字符串,其中包含 UTF-32(但可能较高的 16 位将始终为 0)代码点。 每个标记是长字符串中每个字符的代码点的 4 个字节中的 1 个。 请注意,字节在变成字符串之前被解释为有符号整数,我无法控制这一点。

    // Provided: 
    intEncodedBytesString= "0,0,0,-31,0,0,0,-15,0,0,0,-31"; //3 chars: áñá

    // Wanted
    actualCodePoints = [225,241,225];

我需要将 intEncodedBytesString 转换为 actualCodePoints 数组。 到目前为止,我想出了这个:

var intEncodedBytesStringArray = intEncodedBytesString.toString().split(',');
var i, str = '';
var charAmount = intEncodedBytesStringArray.length / 4;

for (i = 0; i < charAmount; i++) {
  var codePoint = 0;

  for (var j = 0; j < 4; j++) {
    var num = parseInt(intEncodedBytesStringArray[i * 4 + j], 10);
    if (num != 0) {
      if (num < 0) {
        num = (1 << (8 * (4 - j))) + num;
      }

      codePoint += (num << (8 * (3 - j)));
    }
  }

  str += String.fromCodePoint(codePoint);
}

有没有更好、更简单和/或更有效的方法来做到这一点?

我已经看到数十个答案和代码片段来处理类似的事情,但没有解决我的输入字节位于一串有符号整数中的问题:S

编辑:此代码不适用于最高代码点,因为 1<<32 是 1 而不是 2^32。

因为它是一个很好的简单 UTF-32,是的,有一个更简单的方法:只需在四字节块中工作。 此外,处理可能的负值的简单方法是(value + 256) % 256

所以:

var intEncodedBytesString = "0,0,0,-31,0,0,0,-15,0,0,0,-31"; //3 char
var actualCodePoints = [];
var bytes = intEncodedBytesString.split(",").map(Number);
for (var i = 0; i < bytes.length; i += 4) {
  actualCodePoints.push(
       (((bytes[i]     + 256) % 256) << 24) +
       (((bytes[i + 1] + 256) % 256) << 16) +
       (((bytes[i + 2] + 256) % 256) << 8) +
       (bytes[i + 3]   + 256) % 256
  );
}

在评论中详细解释的示例:

 // Starting point var intEncodedBytesString = "0,0,0,-31,0,0,0,-15,0,0,0,-31"; //3 char // Target array var actualCodePoints = []; // Get the bytes as numbers by splitting on comman running the array // through Number to convert to number. var bytes = intEncodedBytesString.split(",").map(Number); // Loop through the bytes building code points var i, cp; for (i = 0; i < bytes.length; i += 4) { // (x + 256) % 256 will handle turning (for instance) -31 into 224 // We shift the value for the first byte left 24 bits, the next byte 16 bits, // the next 8 bits, and don't shift the last one at all. Adding them all // together gives us the code point, which we push into the array. cp = (((bytes[i] + 256) % 256) << 24) + (((bytes[i + 1] + 256) % 256) << 16) + (((bytes[i + 2] + 256) % 256) << 8) + (bytes[i + 3] + 256) % 256; actualCodePoints.push(cp); } // Show the result console.log(actualCodePoints); // If the JavaScript engine supports it, show the string if (String.fromCodePoint) { // ES2015+ var str = String.fromCodePoint.apply(String, actualCodePoints); // The above could be // `let str = String.fromCodePoint(...actualCodePoints);` // on an ES2015+ engine console.log(str); } else { console.log("(Your browser doesn't support String.fromCodePoint)"); }

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM