JS拆分字符串和每个拆分的返回索引

Question

我想在某个正则表达式上拆分文本，并在原始字符串中创建拆分开始位置的索引。 举个简单的例子：

"bla blabla haha".splitOnRegexWithIndex(whitespaceRegex)

需要的输出是

[["bla", 0], ["blabla", 4], ["haha", 11]]

这里的正则表达式可以是任何东西，而不仅仅是空格，所以分隔符不是固定大小。

拆分是在正则表达式上完成的。 我不想使用indexOf在起始字符串中查找"blabla" ，因为这将是 O(n ² ) 复杂性，这在我的场景中是不可接受的。

Answer 1

您可以使用replace它的回调

 let str = `bla blabla haha` let data = [] str.replace(/\S+/g,(m,offset)=>{ data.push([m,offset]) }) console.log(data)

Answer 2

这是基于.exec的可能实现：

 function split_with_offset(str, re) { if (!re.global) { throw "no no no no :("; } let results = []; let m, p; while (p = re.lastIndex, m = re.exec(str)) { results.push([str.substring(p, m.index), p]); } results.push([str.substring(p), p]); return results; } console.log(split_with_offset("bla blabla haha", /\s+/g)); console.log(split_with_offset(" ", /\s+/g)); console.log(split_with_offset("", /\s+/g));

警告：正则表达式必须设置g标志。

Answer 3

您可以使用exec检索带有索引的交互器：

 const s = "bla blabla haha"; for (let m, reg = /\S+/g; m = reg.exec(s);) { console.log(m[0], m.index); }

Answer 4

好吧，您可以先将String.split()与正则表达式一起使用，然后在结果数组上使用Array.map() 。 像这样的东西：

 function splitOnRegexWithIndex(str, regexp) { let offset = 0, tmp; return str .split(regexp) .map(s => (tmp = offset, offset += s.length + 1, [s, tmp])); } console.log( JSON.stringify(splitOnRegexWithIndex("bla blabla haha", /\s/)) ); console.log( JSON.stringify(splitOnRegexWithIndex("bla blabla haha", /b/)) );

 .as-console {background-color:black !important; color:lime;} .as-console-wrapper {max-height:100% !important; top:0;}

但是，就像警告一样，您应该注意，前一种方法只有在拆分标记具有1字符长度时才能很好地工作。 但是，如果我们在拆分正则表达式上使用捕获组，然后在结果数组上使用Array.reduce() ，则可以推广这个想法，如下所示。

 function splitOnRegexWithIndex(str, regexp) { let offset = 0; // Add capturing group to the regular expression. regexp = new RegExp("(" + regexp.source + ")"); // Split the string using capturing group and reduce // the resulting array. return str.split(regexp).reduce((acc, s, idx) => { if (idx % 2 === 0) acc.push([s, offset]); offset += s.length; return acc; }, []); } console.log( JSON.stringify(splitOnRegexWithIndex("bla blabla haha", /\s+/)) ); console.log( JSON.stringify(splitOnRegexWithIndex("abaaagbacccbaaddytbax", /ba+/)) );

 .as-console {background-color:black !important; color:lime;} .as-console-wrapper {max-height:100% !important; top:0;}

Answer 5

如果你的正则表达式不是全局的，你会得到两个部分，一个在第一个匹配之前和一个在第一个匹配之后。

 function splitOnRegexWithIndex(string, regex) { var results = [], cnt = regex.global ? Infinity : 1, m, offset = 0; while (cnt-- && (m = regex.exec(string))) { results.push({ index: offset, text: string.slice(offset, m.index) }); offset = m.index + m[0].length } results.push({ index: offset, text: string.slice(offset) }); return results; } console.log(splitOnRegexWithIndex(`bla blabla haha`, /(\s+)/g));

Answer 6

这是我的解决方案。 它并不漂亮，但是当分隔符是一组简单的单个字符时它可以工作。 不适用于前瞻、回顾等。可以轻松修改以应对类。 有限，但它完成了一些工作。

function splitWithOffset(s) {  
    const wanted = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'.split('');    
    const elemf = () => ({text: '', offset: 0});
    const list = [elemf()];  
    let next = list[0];

    s.split('').forEach((c, i) => {
        if (wanted.includes(c)) {
            if (next.text === '') next.offset = i;
            next.text += c;
        } else {
            next = elemf();
            list.push(next);
        }
    });

    return list.filter(elem => elem.text !== '');
}

Answer 7

您可以使用map和indexOf来了解原始字符串中的位置：

 String.prototype.splitOnRegexWithIndex = function(regex){
    var splitted = this.split(regex);
    var original = this;

    return splitted.map(function(){
               return [this, original.indexOf(this)];
           });
}

JS拆分字符串和每个拆分的返回索引

问题描述

7 个解决方案

解决方案1
3 2019-07-31 16:40:54

解决方案2
3 已采纳 2019-07-31 16:46:54

解决方案3
3 2019-07-31 16:53:40

解决方案4
3 2019-07-31 17:26:40

解决方案5
1 2019-07-31 17:36:10

解决方案6
1 2022-06-23 15:19:20

解决方案7
-2 2019-07-31 16:31:06

JS拆分字符串和每个拆分的返回索引

问题描述

7 个解决方案

解决方案1 3 2019-07-31 16:40:54

解决方案2 3 已采纳 2019-07-31 16:46:54

解决方案3 3 2019-07-31 16:53:40

解决方案4 3 2019-07-31 17:26:40

解决方案5 1 2019-07-31 17:36:10

解决方案6 1 2022-06-23 15:19:20

解决方案7 -2 2019-07-31 16:31:06

解决方案1
3 2019-07-31 16:40:54

解决方案2
3 已采纳 2019-07-31 16:46:54

解决方案3
3 2019-07-31 16:53:40

解决方案4
3 2019-07-31 17:26:40

解决方案5
1 2019-07-31 17:36:10

解决方案6
1 2022-06-23 15:19:20

解决方案7
-2 2019-07-31 16:31:06