简体   繁体   English

在 .NET 正则表达式中有效组合 MatchCollections

[英]Efficiently Combine MatchCollections in .NET regular expressions

In the simplified example, there are two regular expressions, one case sensitive, the other not.在简化示例中,有两个正则表达式,一个区分大小写,另一个不区分大小写。 The idea would be to efficiently create an IEnumerable collection (see "combined" below) combining the results.这个想法是有效地创建一个 IEnumerable 集合(参见下面的“组合”)组合结果。

string test = "abcABC";
string regex = "(?<grpa>a)|(?<grpb>b)|(?<grpc>c)]";
Regex regNoCase = new Regex(regex, RegexOptions.IgnoreCase);
Regex regCase = new Regex(regex);

MatchCollection matchNoCase = regNoCase.Matches(test);
MatchCollection matchCase = regCase.Matches(test);

// Combine matchNoCase and matchCase into an IEnumerable
IEnumerable<Match> combined = null;
foreach (Match match in combined)
{
    // Use the Index and (successful) Groups properties
    //of the match in another operation

}

In practice, the MatchCollections might contain thousands of results and be run frequently using long dynamically created regular expressions, so I'd like to shy away from copying the results to arrays, etc. I am still learning LINQ and am fuzzy on how to go about combining these or what the performance hits to an already sluggish process will be.在实践中,MatchCollections 可能包含数千个结果,并且经常使用动态创建的长正则表达式运行,所以我想避免将结果复制到数组等。我仍在学习 LINQ,并且对如何去做感到模糊关于将这些结合起来,或者对已经缓慢的过程造成的性能影响将是什么。

There are three steps here:这里有三个步骤:

  1. Convert the MatchCollection 's to IEnumerable<Match> 'sMatchCollection的转换为IEnumerable<Match>
  2. Concatenate the sequences连接序列
  3. Filter by whether the Match.Success property is trueMatch.Success属性是否为真过滤

Code:代码:

IEnumerable<Match> combined = matchNoCase.OfType<Match>().Concat(matchCase.OfType<Match>()).Where(m => m.Success);

Doing this creates a new enumerator which only executes each step as the next result is fetched, so you only end up enumerating through each collection once, total.这样做会创建一个新的枚举器,它仅在获取下一个结果时执行每个步骤,因此您最终只会枚举每个集合一次,总计。 For example, Concat() will only start executing the second enumerator after the first runs out.例如, Concat()只会在第一个枚举器用完后才开始执行第二个枚举器。

The answer marked correct creates an IEnumerable with two of each match.标记为正确的答案会创建一个 IEnumerable,每个匹配项中包含两个。 The correct way to combine is below:正确的组合方式如下:

var combined = matches.Where(e=>e.Success).Select(e=>e.Value);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM