简体   繁体   English

将较大的字节数组转换为字符串

[英]Converting a larger byte array to a string

When N is set to 125K the following worksN设置为125K时,以下工作

let N = 125000
let x = [...Array(N)].map(( xx,i) => i)
let y = String.fromCodePoint(...x)
console.log(y.length)

When N is set to 128K that same code breaks:当 N 设置为128K时,相同的代码会中断:

Uncaught RangeError: Maximum call stack size exceeded未捕获的 RangeError:超出最大调用堆栈大小

This is a common operation: what is the optimal way to achieve the conversion?这是一个常见的操作:实现转换的最佳方式是什么?

Note that I did look at this related Q&A.请注意,我确实看过这个相关的问答。 https://stackoverflow.com/a/3195961/1056563 We should not depend on node.js and also the approaches with the fromCharCode.apply are failing. https://stackoverflow.com/a/3195961/1056563我们不应该依赖node.js并且使用fromCharCode.apply的方法也失败了。 Finally that answer is nearly ten years old.最后,这个答案已经有将近十年的历史了。

So what is an up to date performant way to handle this conversion?那么处理这种转换的最新高性能方法是什么?

The problem is caused because implementations have limits to the number of parameters accepted .这个问题是因为实现接受的参数数量限制 This results in an exception being raised when too many parameters (over ~128k in this case) are supplied to the String.fromCodePoint functions via the spread operator.当通过扩展运算符向String.fromCodePoint函数提供太多参数(在本例中超过 ~128k)时,这会导致引发异常。

One way to solve this problem relatively efficiently , albeit with slightly more code, is to batch the operation across multiple calls.相对有效地解决此问题的一种方法(尽管代码稍多)是跨多个调用批处理操作。 Here is my proposed implementation, which fixes what I perceive as issues relating to scaling performance这是我提出的实现,它解决了我认为与扩展性能有关的问题and the handling of surrogate pairs以及代理对的处理(that's incorrect: fromCodePoint doesn't care about surrogates, making it preferable to fromCharCode in such cases). (这是不正确的: fromCodePoint不关心代理,在这种情况下它比fromCharCode更可取)。

let N = 500 * 1000;
let A = [...Array(N)].map((x,i) => i); // start with "an array".

function codePointsToString(cps) {
  let rs = [];
  let batch = 32767; // Supported 'all' browsers
  for (let i = 0; i < cps.length; ){
    let e = i + batch;
    // Build batch section, defer to Array.join.
    rs.push(String.fromCodePoint.apply(null, cps.slice(i, e)));
    i = e;
  }
  return rs.join('');
}

var result = codePointsToString(A);
console.log(result.length);

Also, I wanted a trophy.另外,我想要一个奖杯。 The code above should run in O(n) time and minimize the amount of objects allocated.上面的代码应该在 O(n) 时间内运行并最小化分配的对象数量。 No guarantees on this being the 'best' approach.不能保证这是“最佳”方法。 A benefit of the batching approach, and why the cost of apply (or spread invocation) is subsumed, is that there are significantly less calls to String.fromCodePoint and intermediate strings.批处理方法的一个好处,以及为什么包含apply (或传播调用)的成本,是对String.fromCodePoint和中间字符串的调用显着减少。 YMMV - especially across environments. YMMV - 尤其是跨环境。

Here is an online benchmark .这是一个在线基准 All tests have access to, and use, the same generated "A" array of 500k elements.所有测试都可以访问和使用由 500k 个元素组成的相同生成的“A”数组。

在此处输入图像描述

The given answers are of poor performance: i measured 19 seconds on one of them and the others are similar (*).给出的答案表现不佳:我在其中一个上测量了 19 秒,而其他答案相似 (*)。 It is necessary to preallocate the output array.需要预先分配output 数组。 The following is 20 to 40 milli seconds.以下是 20 到 40毫秒 Three orders of magnitude faster.快三个数量级。

function wordArrayToByteArray(hash) {
    var result = [...Array(hash.sigBytes)].map(x => -1)
    let words = hash.words
        //map each word to an array of bytes
        .map(function (v) {
            // create an array of 4 bytes (less if sigBytes says we have run out)
            var bytes = [0, 0, 0, 0].slice(0, Math.min(4, hash.sigBytes))
                // grab that section of the 4 byte word
                .map(function (d, i) {
                    return (v >>> (8 * i)) % 256;
                })
                // flip that
                .reverse()
            ;
            // remove the bytes we've processed
            // from the bytes we need to process
            hash.sigBytes -= bytes.length;
            return bytes;
        })
    words.forEach((w,i) => {
        result.splice(i * 4, 4, ...w)
    })
    result = result.map(function (d) {
        return String.fromCharCode(d);
    }).join('')
    return result
}

(*) With the possible exception of @User2864740 - we are awaiting his numbers. (*) @User2864740 可能除外 - 我们正在等待他的号码。 But his solution also uses apply() inside the loop which leads to believe it will also be slow.但他的解决方案也在循环内使用apply() ,这导致人们相信它也会很慢。

"Old fashion" JavaScript: “旧时尚”JavaScript:

var N=125000;
var y="";
for(var i=0; i<N; i++)
  y+=String.fromCharCode(i);
console.log(y.length);

Worked with N=1000000使用 N=1000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM