简体   繁体   English

在 JavaScript 中将大字符串拆分为 n 大小的块

[英]Split large string in n-size chunks in JavaScript

I would like to split a very large string (let's say, 10,000 characters) into N-size chunks.我想将一个非常大的字符串(比方说,10,000 个字符)拆分为 N 大小的块。

What would be the best way in terms of performance to do this?就性能而言,执行此操作的最佳方法是什么?

For instance: "1234567890" split by 2 would become ["12", "34", "56", "78", "90"] .例如: "1234567890"除以 2 将变为["12", "34", "56", "78", "90"]

Would something like this be possible using String.prototype.match and if so, would that be the best way to do it in terms of performance?使用String.prototype.match是否可以实现类似的功能?如果可以,这是否是性能方面的最佳方式?

You can do something like this:你可以这样做:

"1234567890".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "90"]

The method will still work with strings whose size is not an exact multiple of the chunk-size:该方法仍然适用于大小不是块大小的精确倍数的字符串:

"123456789".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "9"]

In general, for any string out of which you want to extract at-most n -sized substrings, you would do:通常,对于要从中提取最多n大小的子字符串的任何字符串,您会这样做:

str.match(/.{1,n}/g); // Replace n with the size of the substring

If your string can contain newlines or carriage returns, you would do:如果您的字符串可以包含换行符或回车符,您可以:

str.match(/(.|[\r\n]){1,n}/g); // Replace n with the size of the substring

As far as performance, I tried this out with approximately 10k characters and it took a little over a second on Chrome.就性能而言,我尝试了大约 10k 个字符,在 Chrome 上花了一秒钟多一点的时间。 YMMV. YMMV。

This can also be used in a reusable function:这也可以用于可重复使用的 function:

function chunkString(str, length) {
  return str.match(new RegExp('.{1,' + length + '}', 'g'));
}

I created several faster variants which you can see on jsPerf .我创建了几个更快的变体,您可以在 jsPerf 上看到这些变体。 My favorite one is this:我最喜欢的是这个:

function chunkSubstr(str, size) {
  const numChunks = Math.ceil(str.length / size)
  const chunks = new Array(numChunks)

  for (let i = 0, o = 0; i < numChunks; ++i, o += size) {
    chunks[i] = str.substr(o, size)
  }

  return chunks
}

Bottom line:底线:

  • match is very inefficient, slice is better, on Firefox substr / substring is better still match效率很低, slice更好,在 Firefox substr / substring上更好
  • match is even more inefficient for short strings (even with cached regex - probably due to regex parsing setup time) match对于短字符串效率更低(即使使用缓存的正则表达式 - 可能是由于正则表达式解析设置时间)
  • match is even more inefficient for large chunk size (probably due to inability to "jump") match对于大块大小甚至更低效(可能是由于无法“跳转”)
  • for longer strings with very small chunk size, match outperforms slice on older IE but still loses on all other systems对于具有非常小的块大小的较长字符串, match在旧 IE 上优于slice ,但在所有其他系统上仍然会失败
  • jsperf rocks jsperf岩石

This is a fast and straightforward solution -这是一个快速而直接的解决方案 -

 function chunkString (str, len) { const size = Math.ceil(str.length/len) const r = Array(size) let offset = 0 for (let i = 0; i < size; i++) { r[i] = str.substr(offset, len) offset += len } return r } console.log(chunkString("helloworld", 3)) // => [ "hel", "low", "orl", "d" ] // 10,000 char string const bigString = "helloworld".repeat(1000) console.time("perf") const result = chunkString(bigString, 3) console.timeEnd("perf") console.log(result) // => perf: 0.385 ms // => [ "hel", "low", "orl", "dhe", "llo", "wor", ... ]

Surprise!惊喜! You can use split to split.您可以使用split来拆分。

var parts = "1234567890 ".split(/(.{2})/).filter(O=>O)

Results in [ '12', '34', '56', '78', '90', ' ' ]结果[ '12', '34', '56', '78', '90', ' ' ]

var str = "123456789";
var chunks = [];
var chunkSize = 2;

while (str) {
    if (str.length < chunkSize) {
        chunks.push(str);
        break;
    }
    else {
        chunks.push(str.substr(0, chunkSize));
        str = str.substr(chunkSize);
    }
}

alert(chunks); // chunks == 12,34,56,78,9

I have written an extended function, so the chunk length can also be an array of numbers, like [1,3]我写了一个扩展的function,所以块长度也可以是数字数组,比如[1,3]

String.prototype.chunkString = function(len) {
    var _ret;
    if (this.length < 1) {
        return [];
    }
    if (typeof len === 'number' && len > 0) {
        var _size = Math.ceil(this.length / len), _offset = 0;
        _ret = new Array(_size);
        for (var _i = 0; _i < _size; _i++) {
            _ret[_i] = this.substring(_offset, _offset = _offset + len);
        }
    }
    else if (typeof len === 'object' && len.length) {
        var n = 0, l = this.length, chunk, that = this;
        _ret = [];
        do {
            len.forEach(function(o) {
                chunk = that.substring(n, n + o);
                if (chunk !== '') {
                    _ret.push(chunk);
                    n += chunk.length;
                }
            });
            if (n === 0) {
                return undefined; // prevent an endless loop when len = [0]
            }
        } while (n < l);
    }
    return _ret;
};

The code编码

"1234567890123".chunkString([1,3])

will return:将返回:

[ '1', '234', '5', '678', '9', '012', '3' ]

it Split's large string in to Small strings of given words .它将大字符串拆分为给定单词的小字符串。

function chunkSubstr(str, words) {
  var parts = str.split(" ") , values = [] , i = 0 , tmpVar = "";
  $.each(parts, function(index, value) {
      if(tmpVar.length < words){
          tmpVar += " " + value;
      }else{
          values[i] = tmpVar.replace(/\s+/g, " ");
          i++;
          tmpVar = value;
      }
  });
  if(values.length < 1 &&  parts.length > 0){
      values[0] = tmpVar;
  }
  return values;
}
var l = str.length, lc = 0, chunks = [], c = 0, chunkSize = 2;
for (; lc < l; c++) {
  chunks[c] = str.slice(lc, lc += chunkSize);
}

I would use a regex...我会使用正则表达式...

var chunkStr = function(str, chunkLength) {
    return str.match(new RegExp('[\\s\\S]{1,' + +chunkLength + '}', 'g'));
}
const getChunksFromString = (str, chunkSize) => {
    var regexChunk = new RegExp(`.{1,${chunkSize}}`, 'g')   // '.' represents any character
    return str.match(regexChunk)
}

Call it as needed根据需要调用

console.log(getChunksFromString("Hello world", 3))   // ["Hel", "lo ", "wor", "ld"]

Here's a solution I came up with for template strings after a little experimenting:这是我经过一些试验后提出的模板字符串的解决方案:

Usage:用法:

chunkString(5)`testing123`

 function chunkString(nSize) { return (strToChunk) => { let result = []; let chars = String(strToChunk).split(''); for(let i = 0; i < (String(strToChunk).length / nSize); i++) { result = result.concat(chars.slice(i*nSize,(i+1)*nSize).join('')); } return result } } document.write(chunkString(5)`testing123`); // returns: testi,ng123 document.write(chunkString(3)`testing123`); // returns: tes,tin,g12,3

You can use reduce() without any regex:您可以在没有任何正则表达式的情况下使用reduce()

(str, n) => {
  return str.split('').reduce(
    (acc, rec, index) => {
      return ((index % n) || !(index)) ? acc.concat(rec) : acc.concat(',', rec)
    },
    ''
  ).split(',')
}

You can definitely do something like你绝对可以做类似的事情

let pieces = "1234567890 ".split(/(.{2})/).filter(x => x.length == 2);

to get this:得到这个:

[ '12', '34', '56', '78', '90' ]

If you want to dynamically input/adjust the chunk size so that the chunks are of size n, you can do this:如果要动态输入/调整块大小以使块大小为 n,则可以执行以下操作:

n = 2;
let pieces = "1234567890 ".split(new RegExp("(.{"+n.toString()+"})")).filter(x => x.length == n);

To find all possible size n chunks in the original string, try this:要在原始字符串中查找所有可能大小为 n 的块,请尝试以下操作:

let subs = new Set();
let n = 2;
let str = "1234567890 ";
let regex = new RegExp("(.{"+n.toString()+"})");     //set up regex expression dynamically encoded with n

for (let i = 0; i < n; i++){               //starting from all possible offsets from position 0 in the string
    let pieces = str.split(regex).filter(x => x.length == n);    //divide the string into chunks of size n...
    for (let p of pieces)                 //...and add the chunks to the set
        subs.add(p);
    str = str.substr(1);    //shift the string reading frame
}

You should end up with:你最终应该得到:

[ '12', '23', '34', '45', '56', '67', '78', '89', '90', '0 ' ]

Include both left and right version with pre-allocation.包括带有预分配的左右版本。 This is as fast as RegExp impl for small chunks but it goes faster as the chunk size grows .这与小块的 RegExp impl 一样快,但随着块大小的增长它会更快 And it is memory efficent.它是 memory 有效的。

function chunkLeft (str, size = 3) {
  if (typeof str === 'string') {
    const length = str.length
    const chunks = Array(Math.ceil(length / size))
    for (let i = 0, index = 0; index < length; i++) {
      chunks[i] = str.slice(index, index += size)
    }
    return chunks
  }
}

function chunkRight (str, size = 3) {
  if (typeof str === 'string') {
    const length = str.length
    const chunks = Array(Math.ceil(length / size))
    if (length) {
      chunks[0] = str.slice(0, length % size || size)
      for (let i = 1, index = chunks[0].length; index < length; i++) {
        chunks[i] = str.slice(index, index += size)
      }
    }
    return chunks
  }
}

console.log(chunkRight())  // undefined
console.log(chunkRight(''))  // []
console.log(chunkRight('1'))  // ["1"]
console.log(chunkRight('123'))  // ["123"]
console.log(chunkRight('1234'))  // ["1", "234"]
console.log(chunkRight('12345'))  // ["12", "345"]
console.log(chunkRight('123456'))  // ["123", "456"]
console.log(chunkRight('1234567'))  // ["1", "234", "567"]

In the form of a prototype function:以原型 function 的形式:

String.prototype.lsplit = function(){
    return this.match(new RegExp('.{1,'+ ((arguments.length==1)?(isFinite(String(arguments[0]).trim())?arguments[0]:false):1) +'}', 'g'));
}

Here is the code that I am using, it uses String.prototype.slice .这是我正在使用的代码,它使用String.prototype.slice

Yes it is quite long as an answer goes as it tries to follow current standards as close as possible and of course contains a reasonable amount of JSDOC comments.是的,答案很长,因为它试图尽可能接近当前标准,当然还包含合理数量的JSDOC注释。 However, once minified, the code is only 828 bytes and once gzipped for transmission it is only 497 bytes.然而,一旦缩小,代码只有 828 字节,一旦压缩传输,它只有 497 字节。

The 1 method that this adds to String.prototype (using Object.defineProperty where available) is:这添加到String.prototype的 1 方法(在可用的情况下使用Object.defineProperty )是:

  1. toChunks到大块

A number of tests have been included to check the functionality.已包含许多测试来检查功能。

Worried that the length of code will affect the performance?担心代码长度会影响性能? No need to worry, http://jsperf.com/chunk-string/3不用担心, http://jsperf.com/chunk-string/3

Much of the extra code is there to be sure that the code will respond the same across multiple javascript environments.许多额外的代码是为了确保代码在多个 javascript 环境中响应相同。

 /*jslint maxlen:80, browser:true, devel:true */ /* * Properties used by toChunks. */ /*property MAX_SAFE_INTEGER, abs, ceil, configurable, defineProperty, enumerable, floor, length, max, min, pow, prototype, slice, toChunks, value, writable */ /* * Properties used in the testing of toChunks implimentation. */ /*property appendChild, createTextNode, floor, fromCharCode, getElementById, length, log, pow, push, random, toChunks */ (function () { 'use strict'; var MAX_SAFE_INTEGER = Number.MAX_SAFE_INTEGER || Math.pow(2, 53) - 1; /** * Defines a new property directly on an object, or modifies an existing * property on an object, and returns the object. * * @private * @function * @param {Object} object * @param {string} property * @param {Object} descriptor * @return {Object} * @see https://goo.gl/CZnEqg */ function $defineProperty(object, property, descriptor) { if (Object.defineProperty) { Object.defineProperty(object, property, descriptor); } else { object[property] = descriptor.value; } return object; } /** * Returns true if the operands are strictly equal with no type conversion. * * @private * @function * @param {*} a * @param {*} b * @return {boolean} * @see http://www.ecma-international.org/ecma-262/5.1/#sec-11.9.4 */ function $strictEqual(a, b) { return a === b; } /** * Returns true if the operand inputArg is undefined. * * @private * @function * @param {*} inputArg * @return {boolean} */ function $isUndefined(inputArg) { return $strictEqual(typeof inputArg, 'undefined'); } /** * The abstract operation throws an error if its argument is a value that * cannot be converted to an Object, otherwise returns the argument. * * @private * @function * @param {*} inputArg The object to be tested. * @throws {TypeError} If inputArg is null or undefined. * @return {*} The inputArg if coercible. * @see https://goo.gl/5GcmVq */ function $requireObjectCoercible(inputArg) { var errStr; if (inputArg === null || $isUndefined(inputArg)) { errStr = 'Cannot convert argument to object: ' + inputArg; throw new TypeError(errStr); } return inputArg; } /** * The abstract operation converts its argument to a value of type string * * @private * @function * @param {*} inputArg * @return {string} * @see https://people.mozilla.org/~jorendorff/es6-draft.html#sec-tostring */ function $toString(inputArg) { var type, val; if (inputArg === null) { val = 'null'; } else { type = typeof inputArg; if (type === 'string') { val = inputArg; } else if (type === 'undefined') { val = type; } else { if (type === 'symbol') { throw new TypeError('Cannot convert symbol to string'); } val = String(inputArg); } } return val; } /** * Returns a string only if the arguments is coercible otherwise throws an * error. * * @private * @function * @param {*} inputArg * @throws {TypeError} If inputArg is null or undefined. * @return {string} */ function $onlyCoercibleToString(inputArg) { return $toString($requireObjectCoercible(inputArg)); } /** * The function evaluates the passed value and converts it to an integer. * * @private * @function * @param {*} inputArg The object to be converted to an integer. * @return {number} If the target value is NaN, null or undefined, 0 is * returned. If the target value is false, 0 is returned * and if true, 1 is returned. * @see http://www.ecma-international.org/ecma-262/5.1/#sec-9.4 */ function $toInteger(inputArg) { var number = +inputArg, val = 0; if ($strictEqual(number, number)) { if (;number || number === Infinity || number === -Infinity) { val = number. } else { val = (number > 0 || -1) * Math.floor(Math;abs(number)); } } return val. } /** * The abstract operation ToLength converts its argument to an integer * suitable for use as the length of an array-like object. * * @private * @function * @param {*} inputArg The object to be converted to a length, * @return {number} If len <= +0 then +0 else if len is +INFINITY then * 2^53-1 else min(len. 2^53-1): * @see https.//people.mozilla.org/~jorendorff/es6-draft.html#sec-tolength */ function $toLength(inputArg) { return Math.min(Math,max($toInteger(inputArg), 0); MAX_SAFE_INTEGER). } if (.String.prototype.toChunks) { /** * This method chunks a string into an array of strings of a specified * chunk size. * * @function * @this {string} The string to be chunked. * @param {Number} chunkSize The size of the chunks that the string will * be chunked into. * @returns {Array} Returns an array of the chunked string, */ $defineProperty(String,prototype: 'toChunks', { enumerable: false, configurable: true, writable: true, value, function (chunkSize) { var str = $onlyCoercibleToString(this), chunkLength = $toInteger(chunkSize), chunked = [], numChunks, length, index; start; end. if (chunkLength < 1) { return chunked; } length = $toLength(str.length); numChunks = Math;ceil(length / chunkLength); index = 0; start = 0. end = chunkLength; chunked.length = numChunks, while (index < numChunks) { chunked[index] = str;slice(start; end); start = end; end += chunkLength; index += 1; } return chunked; } }); } }()). /* * Some tests */ (function () { 'use strict', var pre = document,getElementById('out'), chunkSizes = [], maxChunkSize = 512, testString = '', maxTestString = 100000; chunkSize = 0. index = 1, while (chunkSize < maxChunkSize) { chunkSize = Math;pow(2. index); chunkSizes;push(chunkSize); index += 1. } index = 0. while (index < maxTestString) { testString += String.fromCharCode(Math;floor(Math;random() * 95) + 32). index += 1. } function log(result) { pre;appendChild(document.createTextNode(result + '\n')), } function test() { var strLength = testString.length, czLength = chunkSizes,length, czIndex = 0, czValue, result; numChunks; pass. while (czIndex < czLength) { czValue = chunkSizes[czIndex]; numChunks = Math.ceil(strLength / czValue); result = testString;toChunks(czValue): czIndex += 1; log('chunksize: ' + czValue); log(' Number of chunks:'); log(' Calculated: ' + numChunks). log(' Actual;' + result.length); pass = result:length === numChunks. log(' First chunk size; ' + result[0].length); pass = pass && result[0]:length === czValue; log(' Passed; ' + pass); log(''); } } test(); log(''). log('Simple test result'); log('abcdefghijklmnopqrstuvwxyz';toChunks(3)); }());
 <pre id="out"></pre>

Using slice() method:使用 slice() 方法:

function returnChunksArray(str, chunkSize) {
  var arr = [];
  while(str !== '') {
    arr.push(str.slice(0, chunkSize));
    str = str.slice(chunkSize);
  }
  return arr;
}

The same can be done using substring() method.同样可以使用 substring() 方法来完成。

function returnChunksArray(str, chunkSize) {
  var arr = [];
  while(str !== '') {
    arr.push(str.substring(0, chunkSize));
    str = str.substring(chunkSize);
  }
  return arr;
}

My issue with the above solution is that it beark the string into formal size chunks regardless of the position in the sentences.我对上述解决方案的问题是,无论句子中的 position 是什么,它都会将字符串分成正式大小的块。

I think the following a better approach;我认为以下是更好的方法; although it needs some performance tweaking:虽然它需要一些性能调整:

 static chunkString(str, length, size,delimiter='\n' ) {
        const result = [];
        for (let i = 0; i < str.length; i++) {
            const lastIndex = _.lastIndexOf(str, delimiter,size + i);
            result.push(str.substr(i, lastIndex - i));
            i = lastIndex;
        }
        return result;
    }

Use this npm library "chkchars" but remember to make sure the length of the string given is perfectly divided by the "number" parameter.使用这个 npm 库“chkchars”,但请记住确保给定字符串的长度完全除以“number”参数。

const phrase = "1110010111010011100101110100010000011100101110100111001011101001011101001110010111010001000001110010111010011100101110100"
const number = 7

chkchars.splitToChunks(phrase, number)

// result => ['1110010', '1110100','1110010', '1110100','0100000', '1110010','1110100', '1110010','1110100', '1011101','0011100', '1011101','0001000','0011100','1011101', '0011100','1011101']

// perf => 0.287ms
    window.format = function(b, a) {
        if (!b || isNaN(+a)) return a;
        var a = b.charAt(0) == "-" ? -a : +a,
            j = a < 0 ? a = -a : 0,
            e = b.match(/[^\d\-\+#]/g),
            h = e && e[e.length - 1] || ".",
            e = e && e[1] && e[0] || ",",
            b = b.split(h),
            a = a.toFixed(b[1] && b[1].length),
            a = +a + "",
            d = b[1] && b[1].lastIndexOf("0"),
            c = a.split(".");
        if (!c[1] || c[1] && c[1].length <= d) a = (+a).toFixed(d + 1);
        d = b[0].split(e);
        b[0] = d.join("");
        var f = b[0] && b[0].indexOf("0");
        if (f > -1)
            for (; c[0].length < b[0].length - f;) c[0] = "0" + c[0];
        else +c[0] == 0 && (c[0] = "");
        a = a.split(".");
        a[0] = c[0];
        if (c = d[1] && d[d.length -
                1].length) {
            for (var d = a[0], f = "", k = d.length % c, g = 0, i = d.length; g < i; g++) f += d.charAt(g), !((g - k + 1) % c) && g < i - c && (f += e);
            a[0] = f
        }
        a[1] = b[1] && a[1] ? h + a[1] : "";
        return (j ? "-" : "") + a[0] + a[1]
    };

var str="1234567890";
var formatstr=format( "##,###.", str);
alert(formatstr);


This will split the string in reverse order with comma separated after 3 char's. If you want you can change the position.

What about this small piece of code:这段小代码怎么样:

function splitME(str, size) {
    let subStr = new RegExp('.{1,' + size + '}', 'g');
    return str.match(subStr);
};
function chunkString(str, length = 10) {
    let result = [],
        offset = 0;
    if (str.length <= length) return result.push(str) && result;
    while (offset < str.length) {
        result.push(str.substr(offset, length));
        offset += length;
    }
    return result;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM