[英]Converting a larger byte array to a string
When N
is set to 125K
the following works当
N
设置为125K
时,以下工作
let N = 125000
let x = [...Array(N)].map(( xx,i) => i)
let y = String.fromCodePoint(...x)
console.log(y.length)
When N is set to 128K
that same code breaks:当 N 设置为
128K
时,相同的代码会中断:
Uncaught RangeError: Maximum call stack size exceeded
未捕获的 RangeError:超出最大调用堆栈大小
This is a common operation: what is the optimal way to achieve the conversion?这是一个常见的操作:实现转换的最佳方式是什么?
Note that I did look at this related Q&A.请注意,我确实看过这个相关的问答。 https://stackoverflow.com/a/3195961/1056563 We should not depend on
node.js
and also the approaches with the fromCharCode.apply
are failing. https://stackoverflow.com/a/3195961/1056563我们不应该依赖
node.js
并且使用fromCharCode.apply
的方法也失败了。 Finally that answer is nearly ten years old.最后,这个答案已经有将近十年的历史了。
So what is an up to date performant way to handle this conversion?那么处理这种转换的最新高性能方法是什么?
The problem is caused because implementations have limits to the number of parameters accepted .这个问题是因为实现对接受的参数数量有限制。 This results in an exception being raised when too many parameters (over ~128k in this case) are supplied to the
String.fromCodePoint
functions via the spread operator.当通过扩展运算符向
String.fromCodePoint
函数提供太多参数(在本例中超过 ~128k)时,这会导致引发异常。
One way to solve this problem relatively efficiently , albeit with slightly more code, is to batch the operation across multiple calls.相对有效地解决此问题的一种方法(尽管代码稍多)是跨多个调用批处理操作。 Here is my proposed implementation, which fixes what I perceive as issues relating to scaling performance
这是我提出的实现,它解决了我认为与扩展性能有关的问题
and the handling of surrogate pairs(that's incorrect: 以及代理对的处理
fromCodePoint
doesn't care about surrogates, making it preferable to fromCharCode
in such cases). (这是不正确的:
fromCodePoint
不关心代理,在这种情况下它比fromCharCode
更可取)。
let N = 500 * 1000;
let A = [...Array(N)].map((x,i) => i); // start with "an array".
function codePointsToString(cps) {
let rs = [];
let batch = 32767; // Supported 'all' browsers
for (let i = 0; i < cps.length; ){
let e = i + batch;
// Build batch section, defer to Array.join.
rs.push(String.fromCodePoint.apply(null, cps.slice(i, e)));
i = e;
}
return rs.join('');
}
var result = codePointsToString(A);
console.log(result.length);
Also, I wanted a trophy.另外,我想要一个奖杯。 The code above should run in O(n) time and minimize the amount of objects allocated.
上面的代码应该在 O(n) 时间内运行并最小化分配的对象数量。 No guarantees on this being the 'best' approach.
不能保证这是“最佳”方法。 A benefit of the batching approach, and why the cost of
apply
(or spread invocation) is subsumed, is that there are significantly less calls to String.fromCodePoint
and intermediate strings.批处理方法的一个好处,以及为什么包含
apply
(或传播调用)的成本,是对String.fromCodePoint
和中间字符串的调用显着减少。 YMMV - especially across environments. YMMV - 尤其是跨环境。
Here is an online benchmark .这是一个在线基准。 All tests have access to, and use, the same generated "A" array of 500k elements.
所有测试都可以访问和使用由 500k 个元素组成的相同生成的“A”数组。
The given answers are of poor performance: i measured 19 seconds on one of them and the others are similar (*).给出的答案表现不佳:我在其中一个上测量了 19 秒,而其他答案相似 (*)。 It is necessary to preallocate the output array.
需要预先分配output 数组。 The following is 20 to 40 milli seconds.
以下是 20 到 40毫秒。 Three orders of magnitude faster.
快三个数量级。
function wordArrayToByteArray(hash) {
var result = [...Array(hash.sigBytes)].map(x => -1)
let words = hash.words
//map each word to an array of bytes
.map(function (v) {
// create an array of 4 bytes (less if sigBytes says we have run out)
var bytes = [0, 0, 0, 0].slice(0, Math.min(4, hash.sigBytes))
// grab that section of the 4 byte word
.map(function (d, i) {
return (v >>> (8 * i)) % 256;
})
// flip that
.reverse()
;
// remove the bytes we've processed
// from the bytes we need to process
hash.sigBytes -= bytes.length;
return bytes;
})
words.forEach((w,i) => {
result.splice(i * 4, 4, ...w)
})
result = result.map(function (d) {
return String.fromCharCode(d);
}).join('')
return result
}
(*) With the possible exception of @User2864740 - we are awaiting his numbers. (*) @User2864740 可能除外 - 我们正在等待他的号码。 But his solution also uses
apply()
inside the loop which leads to believe it will also be slow.但他的解决方案也在循环内使用
apply()
,这导致人们相信它也会很慢。
"Old fashion" JavaScript: “旧时尚”JavaScript:
var N=125000;
var y="";
for(var i=0; i<N; i++)
y+=String.fromCharCode(i);
console.log(y.length);
Worked with N=1000000使用 N=1000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.