[英]How can I match overlapping strings with regex?
Let's say I have the string假设我有字符串
"12345"
If I .match(/\\d{3}/g)
, I only get one match, "123"
.如果我
.match(/\\d{3}/g)
,我只会得到一场比赛, "123"
。 Why don't I get [ "123", "234", "345" ]
?为什么我没有得到
[ "123", "234", "345" ]
?
The string#match
with a global flag regex returns an array of matched substrings .带有全局标志正则表达式的
string#match
返回一个匹配的 substrings数组。 The /\\d{3}/g
regex matches and consumes (= reads into the buffer and advances its index to the position right after the currently matched character ) 3 digit sequence. /\\d{3}/g
正则表达式匹配并消耗(=读入缓冲区并将其索引推进到当前匹配字符之后的位置)3 位数字序列。 Thus, after "eating up" 123
, the index is located after 3
, and the only substring left for parsing is 45
- no match here.因此,在“吃掉”
123
,索引位于3
之后,剩下的唯一解析子串是45
- 此处不匹配。
I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string.我认为regex101.com 中使用的技术在这里也值得考虑:使用零宽度断言(带有捕获组的正向前瞻)来测试输入字符串内的所有位置。 After each test, the
RegExp.lastIndex
(it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.每次测试后,
RegExp.lastIndex
(它是正则表达式的读/写整数属性,指定开始下一个匹配的索引)“手动”推进以避免无限循环。
Note it is a technique implemented in .NET ( Regex.Matches
), Python ( re.findall
), PHP ( preg_match_all
), Ruby ( String#scan
) and can be used in Java, too.请注意,它是在 .NET (
Regex.Matches
)、Python ( re.findall
)、PHP ( preg_match_all
)、Ruby ( String#scan
) 中实现的一种技术,也可以在 Java 中使用。 Here is a demo using matchAll
:这是一个使用
matchAll
的演示:
var re = /(?=(\\d{3}))/g; console.log( Array.from('12345'.matchAll(re), x => x[1]) );
Here is an ES5 compliant demo:这是一个符合 ES5 的演示:
var re = /(?=(\\d{3}))/g; var str = '12345'; var m, res = []; while (m = re.exec(str)) { if (m.index === re.lastIndex) { re.lastIndex++; } res.push(m[1]); } console.log(res);
Here is a regex101.com demo这是一个regex101.com 演示
Note that the same can be written with a "regular" consuming \\d{3}
pattern and manually set re.lastIndex
to m.index+1
value after each successful match:请注意,可以使用“常规”消耗
\\d{3}
模式编写相同的内容,并在每次成功匹配后手动将re.lastIndex
设置为m.index+1
值:
var re = /\\d{3}/g; var str = '12345'; var m, res = []; while (m = re.exec(str)) { res.push(m[0]); re.lastIndex = m.index + 1; // <- Important } console.log(res);
You can't do this with a regex alone, but you can get pretty close:你不能单独使用正则表达式来做到这一点,但你可以非常接近:
var pat = /(?=(\\d{3}))\\d/g; var results = []; var match; while ( (match = pat.exec( '1234567' ) ) != null ) { results.push( match[1] ); } console.log(results);
In other words, you capture all three digits inside the lookahead, then go back and match one character in the normal way just to advance the match position.换句话说,您在前瞻中捕获所有三个数字,然后返回并以正常方式匹配一个字符,只是为了推进匹配位置。 It doesn't matter how you consume that character;
你如何消费这个角色并不重要;
.
works just as well \\d
.工作得一样好
\\d
。 And if you're really feeling adventurous, you can use just the lookahead and let JavaScript handle the bump-along.如果你真的喜欢冒险,你可以只使用前瞻,让 JavaScript 处理颠簸。
This code is adapted from this answer .此代码改编自此答案。 I would have flagged this question as a duplicate of that one, but the OP accepted another, lesser answer.
我会将这个问题标记为该问题的重复,但 OP 接受了另一个较小的答案。
When an expression matches, it usually consumes the characters it matched.当一个表达式匹配时,它通常会消耗它匹配的字符。 So, after the expression matched
123
, only 45
is left, which doesn't match the pattern.因此,在表达式匹配
123
,只剩下45
,这与模式不匹配。
To answer the "How", you can manually change the index of the last match (requires a loop) :要回答“如何”,您可以手动更改最后一场比赛的索引(需要循环):
var input = '12345',
re = /\d{3}/g,
r = [],
m;
while (m = re.exec(input)) {
re.lastIndex -= m[0].length - 1;
r.push(m[0]);
}
r; // ["123", "234", "345"]
Here is a function for convenience :为方便起见,这是一个函数:
function matchOverlap(input, re) {
var r = [], m;
// prevent infinite loops
if (!re.global) re = new RegExp(
re.source, (re+'').split('/').pop() + 'g'
);
while (m = re.exec(input)) {
re.lastIndex -= m[0].length - 1;
r.push(m[0]);
}
return r;
}
Usage examples :用法示例:
matchOverlap('12345', /\D{3}/) // []
matchOverlap('12345', /\d{3}/) // ["123", "234", "345"]
matchOverlap('12345', /\d{3}/g) // ["123", "234", "345"]
matchOverlap('1234 5678', /\d{3}/) // ["123", "234", "567", "678"]
matchOverlap('LOLOL', /lol/) // []
matchOverlap('LOLOL', /lol/i) // ["LOL", "LOL"]
I would consider not using a regex for this.我会考虑不为此使用正则表达式。 If you want to split into groups of three you can just loop over the string starting at the offset:
如果你想分成三组,你可以从偏移量开始循环遍历字符串:
let s = "12345" let m = Array.from(s.slice(2), (_, i) => s.slice(i, i+3)) console.log(m)
Use (?=(\\w{3}))
使用
(?=(\\w{3}))
(3 being the number of letters in the sequence) (3 是序列中的字母数)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.