简体   繁体   English

在C#中计算与Regex的重叠匹配

[英]Counting overlapping matches with Regex in C#

The following code evaluates 2 instead of 4: 以下代码计算2而不是4:

Regex.Matches("020202020", "020").Count;

I'm guessing the regex starts looking for the next match from the end of the previous match. 我猜测正则表达式从上一场比赛结束开始寻找下一场比赛。 Is there any way to prevent this. 有什么办法可以防止这种情况发生。 I have a string of '0's and '2's and I'm trying to count how many times I have three '2's in a row, four '2's in a row etc. 我有一个'0'和'2'字符串,我试图计算连续三次'2'连续多少次,连续四次'2'等等。

This will return 4 as you expect: 这将按预期返回4

Regex.Matches("020202020", @"0(?=20)").Count;

The lookahead matches the 20 without consuming it, so the next match attempt starts at the position following the first 0 . 前瞻匹配 20而不消耗它,因此下一次匹配尝试从第一个0之后的位置开始。 You can even do the whole regex as a lookahead: 您甚至可以将整个正则表达式作为前瞻:

Regex.Matches("020202020", @"(?=020)").Count;

The regex engine automatically bumps ahead one position each time it makes a zero-length match. 每次进行零长度匹配时,正则表达式引擎会自动向前移动一个位置。 So, to find all runs of three 2 's or four 2 's, you can use: 因此,要查找三个2或4个2的所有运行,您可以使用:

Regex.Matches("22222222", @"(?=222)").Count;  // 6

...and: ...和:

Regex.Matches("22222222", @"(?=2222)").Count;  // 5

EDIT: Looking over your question again, it occurs to me you might be looking for 2 's interspersed with 0 's 编辑:再次回顾你的问题,我发现你可能正在寻找2的穿插0

Regex.Matches("020202020", @"(?=20202)").Count;  // 2

If you don't know how many 0 's there will be, you can use this: 如果您不知道会有多少0 ,您可以使用:

Regex.Matches("020202020", @"(?=20*20*2)").Count;  // 2

And of course, you can use quantifiers to reduce repetition in the regex: 当然,您可以使用量词来减少正则表达式中的重复:

Regex.Matches("020202020", @"(?=2(?:0*2){2})").Count;  // 2

Indeed, a regular expression will continue from where the last one ended. 实际上,正则表达式将从最后一个结束的地方继续。 You can work around it by using lookahead patterns. 您可以使用先行模式解决它。 I'm not a .NET guy, but try this: "(?=020)." 我不是.NET的人,但试试这个: "(?=020)." Translation: "find me any single character, where this character and the next two characters are 020 ". 翻译:“找到我的任何一个字符,这个字符和接下来的两个字符都是020 ”。 The trick is that the match is only one character wide, not three, so you will get all the matches in the string, even if they overlap. 诀窍是匹配只有一个字符宽,而不是三个,所以你将获得字符串中的所有匹配,即使它们重叠。

(you could also write it as "0(?=20)" , but that's less clear to humans at least :p ) (你也可以把它写成"0(?=20)" ,但至少对人类不太清楚:p)

Try this, using zero-width positive lookbehind: 尝试使用零宽度正向lookbehind:

Regex.Matches("020202020",@"(?<=020)").Count;

Worked for me, yields 4 matches. 为我工作,收获4场比赛。

My favorite reference for Regex: Regular Expression Language - Quick Reference Also a quick way to try out your Regex, I use it quite often for complex Regex: Free Regular Expression Designer 我最喜欢的正则表达式参考: 正则表达式语言 - 快速参考也是一种快速尝试你的正则表达式的方法,我经常使用它来复杂的正则表达式: 免费正则表达式设计器

Assuming that you are indeed looking for sequences of consecutive 2 -s, there is another option without using lookaheads at all . 假设你确实在寻找连续2 s的序列,那么根本没有使用前瞻性的选项。 (This would not work for arbitrary sequences where you look for patterns of 0 and 2 .) (这对于查找02模式的任意序列都不起作用。)

Enumerate all occurrences of non-overlapping sequences of three or more 2 -s (how?) and then infer the number of shorter subsequences. 枚举所有出现的三个或更多2 -s的非重叠序列(如何?),然后推断出较短子序列的数量。

For example, if you find one sequence of six consecutive 2 -s and one of five consecutive 2 -s, then you know that you must have (6-3+1) + (5-3+1) = ? 例如,如果您找到一个连续六个连续2序列和五个连续的2一个,那么您知道您必须有(6-3 + 1)+(5-3 + 1)=? sequences of three consecutive 2 -s (potentially overlapping), and so on: 连续三个2序列(可能重叠),依此类推:

0002222220000002222200
   222
    222
     222
      222
               222
                222
                 222

For large strings, this should be somewhat faster than using lookaheads. 对于大字符串,这应该比使用前瞻更快

Because the source contains two "020" patterns which your regex pattern is matching. 因为源包含两个“020”模式,您的正则表达式模式匹配。 Try changing your source to this: 尝试将您的来源更改为:

Regex.Matches("020202020", "02").Count;

Now it will match 02's in a row and you will get four this time. 现在它将连续匹配02,这次你将得到4。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM