如何将重叠字符串与正则表达式匹配？

Question

Let's say I have the string假设我有字符串

"12345"

If I .match(/\\d{3}/g) , I only get one match, "123" .如果我.match(/\\d{3}/g) ，我只会得到一场比赛， "123" 。 Why don't I get [ "123", "234", "345" ] ?为什么我没有得到[ "123", "234", "345" ] ？

Answer 1

The string#match with a global flag regex returns an array of matched substrings .带有全局标志正则表达式的string#match返回一个匹配的 substrings数组。 The /\\d{3}/g regex matches and consumes (= reads into the buffer and advances its index to the position right after the currently matched character ) 3 digit sequence. /\\d{3}/g正则表达式匹配并消耗（=读入缓冲区并将其索引推进到当前匹配字符之后的位置）3 位数字序列。 Thus, after "eating up" 123 , the index is located after 3 , and the only substring left for parsing is 45 - no match here.因此，在“吃掉” 123 ，索引位于3之后，剩下的唯一解析子串是45 - 此处不匹配。

I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string.我认为regex101.com 中使用的技术在这里也值得考虑：使用零宽度断言（带有捕获组的正向前瞻）来测试输入字符串内的所有位置。 After each test, the RegExp.lastIndex (it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.每次测试后， RegExp.lastIndex （它是正则表达式的读/写整数属性，指定开始下一个匹配的索引）“手动”推进以避免无限循环。

Note it is a technique implemented in .NET ( Regex.Matches ), Python ( re.findall ), PHP ( preg_match_all ), Ruby ( String#scan ) and can be used in Java, too.请注意，它是在 .NET ( Regex.Matches )、Python ( re.findall )、PHP ( preg_match_all )、Ruby ( String#scan ) 中实现的一种技术，也可以在 Java 中使用。 Here is a demo using matchAll :这是一个使用matchAll的演示：

 var re = /(?=(\\d{3}))/g; console.log( Array.from('12345'.matchAll(re), x => x[1]) );

Here is an ES5 compliant demo:这是一个符合 ES5 的演示：

 var re = /(?=(\\d{3}))/g; var str = '12345'; var m, res = []; while (m = re.exec(str)) { if (m.index === re.lastIndex) { re.lastIndex++; } res.push(m[1]); } console.log(res);

Here is a regex101.com demo这是一个regex101.com 演示

Note that the same can be written with a "regular" consuming \\d{3} pattern and manually set re.lastIndex to m.index+1 value after each successful match:请注意，可以使用“常规”消耗\\d{3}模式编写相同的内容，并在每次成功匹配后手动将re.lastIndex设置为m.index+1值：

 var re = /\\d{3}/g; var str = '12345'; var m, res = []; while (m = re.exec(str)) { res.push(m[0]); re.lastIndex = m.index + 1; // <- Important } console.log(res);

Answer 2

You can't do this with a regex alone, but you can get pretty close:你不能单独使用正则表达式来做到这一点，但你可以非常接近：

 var pat = /(?=(\\d{3}))\\d/g; var results = []; var match; while ( (match = pat.exec( '1234567' ) ) != null ) { results.push( match[1] ); } console.log(results);

In other words, you capture all three digits inside the lookahead, then go back and match one character in the normal way just to advance the match position.换句话说，您在前瞻中捕获所有三个数字，然后返回并以正常方式匹配一个字符，只是为了推进匹配位置。 It doesn't matter how you consume that character;你如何消费这个角色并不重要； . works just as well \\d .工作得一样好\\d 。 And if you're really feeling adventurous, you can use just the lookahead and let JavaScript handle the bump-along.如果你真的喜欢冒险，你可以只使用前瞻，让 JavaScript 处理颠簸。

This code is adapted from this answer .此代码改编自此答案。 I would have flagged this question as a duplicate of that one, but the OP accepted another, lesser answer.我会将这个问题标记为该问题的重复，但 OP 接受了另一个较小的答案。

Answer 3

When an expression matches, it usually consumes the characters it matched.当一个表达式匹配时，它通常会消耗它匹配的字符。 So, after the expression matched 123 , only 45 is left, which doesn't match the pattern.因此，在表达式匹配123 ，只剩下45 ，这与模式不匹配。

Answer 4

To answer the "How", you can manually change the index of the last match (requires a loop) :要回答“如何”，您可以手动更改最后一场比赛的索引（需要循环）：

var input = '12345', 
    re = /\d{3}/g, 
    r = [], 
    m;
while (m = re.exec(input)) {
    re.lastIndex -= m[0].length - 1;
    r.push(m[0]);
}
r; // ["123", "234", "345"]

Here is a function for convenience :为方便起见，这是一个函数：

function matchOverlap(input, re) {
    var r = [], m;
    // prevent infinite loops
    if (!re.global) re = new RegExp(
        re.source, (re+'').split('/').pop() + 'g'
    );
    while (m = re.exec(input)) {
        re.lastIndex -= m[0].length - 1;
        r.push(m[0]);
    }
    return r;
}

Usage examples :用法示例：

matchOverlap('12345', /\D{3}/)      // []
matchOverlap('12345', /\d{3}/)      // ["123", "234", "345"]
matchOverlap('12345', /\d{3}/g)     // ["123", "234", "345"]
matchOverlap('1234 5678', /\d{3}/)  // ["123", "234", "567", "678"]
matchOverlap('LOLOL', /lol/)        // []
matchOverlap('LOLOL', /lol/i)       // ["LOL", "LOL"]

Answer 5

I would consider not using a regex for this.我会考虑不为此使用正则表达式。 If you want to split into groups of three you can just loop over the string starting at the offset:如果你想分成三组，你可以从偏移量开始循环遍历字符串：

 let s = "12345" let m = Array.from(s.slice(2), (_, i) => s.slice(i, i+3)) console.log(m)

Answer 6

Use (?=(\\w{3}))使用(?=(\\w{3}))

(3 being the number of letters in the sequence) （3 是序列中的字母数）

如何将重叠字符串与正则表达式匹配？

问题描述

6 个解决方案

解决方案1
24 2015-11-24 21:08:23

解决方案2
23 已采纳 2013-12-30 10:25:53

解决方案3
14 2013-12-30 04:28:07

解决方案4
7 2013-12-30 07:47:03

解决方案5
0 2018-09-13 16:45:04

解决方案6
-1 2017-11-08 15:58:40

如何将重叠字符串与正则表达式匹配？

问题描述

6 个解决方案

解决方案1 24 2015-11-24 21:08:23

解决方案2 23 已采纳 2013-12-30 10:25:53

解决方案3 14 2013-12-30 04:28:07

解决方案4 7 2013-12-30 07:47:03

解决方案5 0 2018-09-13 16:45:04

解决方案6 -1 2017-11-08 15:58:40

解决方案1
24 2015-11-24 21:08:23

解决方案2
23 已采纳 2013-12-30 10:25:53

解决方案3
14 2013-12-30 04:28:07

解决方案4
7 2013-12-30 07:47:03

解决方案5
0 2018-09-13 16:45:04

解决方案6
-1 2017-11-08 15:58:40