简体   繁体   English

为什么C#RegexOptions.Compiled会使匹配变慢?

[英]Why does the C# RegexOptions.Compiled makes the match slower?

I have the following code: 我有以下代码:

static void Main(string[] args)
{
    const string RegXPattern = @"/api/(?<controller>\w+)/(?<action>\w+)/?$";
    var regex = new Regex(RegXPattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);

    const string InputToMatch = "/api/person/load";

    regex.IsMatch(InputToMatch); // Warmup

    var sw = Stopwatch.StartNew();
    for (int i = 0; i < 10000000; i++)
    {
        var match = regex.IsMatch(InputToMatch);
    }
    sw.Stop();

    Console.WriteLine(sw.Elapsed.ToString());
    Console.ReadLine();
}

Running the above on my machine under Releae , finishes in around 18 seconds and removing the RegexOptions.Compiled makes it run in 13 seconds. Releae下的我的机器上运行上面的程序 ,大约18秒后完成并删除RegexOptions.Compiled使其在13秒内运行。

My understanding was that including this flag would make the match faster but in my example it is resulting in ~30% lower performance. 我的理解是,包含此标志会使匹配更快,但在我的示例中,它会导致性能降低约30%

What am I missing here? 我在这里错过了什么?

I think it is the RegexOptions.IgnoreCase that is causing the slow down here. 我认为这是导致速度减慢的RegexOptions.IgnoreCase These are my timings for comparison: 这些是我的比较时间:

Compiled     11s
Not compiled 10s

Using the inline modifier (?i) in the regex gives these results: 在正则表达式中使用内联修饰符(?i)得到以下结果:

Compiled     10s
Not compiled 9s

Not including the case comparison in the regex (by using /API/(?<controller>\\w+)/(?<action>\\w+)/?$ as the pattern, and .ToUpper() on the input so that the same number of matches are done): 不包括正则表达式中的案例比较(通过使用/API/(?<controller>\\w+)/(?<action>\\w+)/?$作为模式,并在输入上使用.ToUpper()以使其相同比赛次数完成):

Compiled     6s
Not compiled 8s

Taking this one step further (as suggested by Antonín) and using the case-insensitive pattern /[aA][pP][iI]/(?<controller>\\w+)/(?<action>\\w+)/?$ gives: 更进一步(如Antonín所建议)并使用不区分大小写的模式/[aA][pP][iI]/(?<controller>\\w+)/(?<action>\\w+)/?$给出:

Compiled     5s
Not compiled 8s

From this, the fastest of them all is using RegexOptions.Compiled , but dealing with the casing of the /api/ prefix using pattern matching in the regex. 从这一点来看,其中最快的是使用RegexOptions.Compiled ,但在正则表达式中使用模式匹配来处理/api/前缀的大小写。

To verify these results, I've also ran them using a set of randomised (but still matching) inputs. 为了验证这些结果,我还使用一组随机(但仍匹配)输入来运行它们。 Here are the results: 结果如下:

IgnoreCase | Compiled                13s
IgnoreCase                           11s
(?i) plus Compiled                   13s
(?i)                                 11s
Compiled plus external case handling 9s
External case handling               12s
Case handling in regex plus Compiled 8s
Case handling in regex               11s

As to why this is slower, this blog post discusses a possible reason. 至于为什么这个更慢, 这篇博文讨论了一个可能的原因。

The problem is that the compiled Regex version does a char by char comparison with the current culture of the form 问题是编译的Regex版本通过char与表单的当前文化进行比较来执行char

if .... char.ToLower(runtext[index2], CultureInfo.CurrentCulture) == 'c' ....

where for each character the thread static CultureInfo.CurrentCulture is retrieved. 每个字符的位置检索线程静态CultureInfo.CurrentCulture。

This shows up in the profiler as CPU consumer: 这在分析器中显示为CPU使用者:

在此输入图像描述

I have filed an issue for .NET Core and fixed it with a PR . 我已经为.NET Core提出了一个问题用PR修复了它 If you need that merged back to the regular .NET Framework you should file an issue at github to request a backport. 如果需要将其合并回常规.NET Framework,则应在github上提出问题以请求返回。 The issue shows up for all compiled Regex which have set 该问题显示已设置的所有已编译的正则表达式

  • RegexOptions.IgnoreCase | RegexOptions.IgnoreCase | RegexOptions.Compiled RegexOptions.Compiled
  • RegexOptions.CultureInvariant | RegexOptions.CultureInvariant | RegexOptions.Compiled RegexOptions.Compiled
  • RegexOptions.CultureInvariant | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase RegexOptions.Compiled RegexOptions.IgnoreCase RegexOptions.Compiled

The seemingly strange option RegexOptions.CultureInvariant | 看似奇怪的选项RegexOptions.CultureInvariant | RegexOptions.Compiled is in fact necessary if you create a regular expression on a thread with a specific locale which has special casing or number separators. 如果在具有特定区域设置的线程上创建正则表达式,则必须使用RegexOptions.Compiled ,该特定区域设置具有特殊的套管或数字分隔符。 The Regex match expression will be specifically created according to your current locale. 将根据您当前的区域设置专门创建正则表达式匹配表达式。 If you want a locale independant Regex then you need to use RegexOptions.CultureInvariant. 如果您想要一个独立于区域设置的Regex,那么您需要使用RegexOptions.CultureInvariant。

According MSDN best practices, 根据MSDN最佳做法,

We recommend that you use interpreted regular expressions when you call regular expression methods with a specific regular expression relatively infrequently. 当您使用特定正则表达式相对不频繁地调用正则表达式方法时,我们建议您使用解释的正则表达式。 You should use compiled regular expressions when you call regular expression methods with a specific regular expression relatively frequently. 当您使用特定正则表达式相对频繁地调用正则表达式方法时,应使用已编译的正则表达式。 The exact threshold at which the slower execution speeds of interpreted regular expressions outweigh gains from their reduced startup time, or the threshold at which the slower startup times of compiled regular expressions outweigh gains from their faster execution speeds, is difficult to determine. 解释的正则表达式的较慢执行速度超过其减少的启动时间所获得的确切阈值,或者编译正则表达式的较慢启动时间超过其较快执行速度的增益的阈值很难确定。 It depends on a variety of factors, including the complexity of the regular expression and the specific data that it processes. 它取决于多种因素,包括正则表达式的复杂性和它处理的特定数据。 To determine whether interpreted or compiled regular expressions offer the best performance for your particular application scenario, you can use the Stopwatch class to compare their execution times. 要确定解释或编译的正则表达式是否为特定应用程序方案提供最佳性能,可以使用Stopwatch类来比较它们的执行时间。

Moreover you have two option supplied to Regex IgnoreCase and Compiled . 此外,您有两个选项提供给Regex IgnoreCaseCompiled If you remove the IgnoreCase then Compiled option will give you better performance. 如果删除IgnoreCaseCompiled选项将为您提供更好的性能。 Also the Compiled option is usually preferred when declaring your Regex expressions globally so that they can be compiled at startup. 在全局声明Regex表达式时,通常首选Compiled选项,以便在启动时编译它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM