简体   繁体   English

如何使Regex操作超时以防止在.NET 4.5中挂起?

[英]How do I timeout Regex operations to prevent hanging in .NET 4.5?

There are times when being able to limit the pattern matching duration of regex operations could be useful. 有时能够限制正则表达式操作的模式匹配持续时间可能会很有用。 In particular, when working with user supplied patterns to match data, the pattern might exhibit poor performance due to nested quantifiers and excessive back-tracking (see catastrophic backtracking ). 特别是,当使用用户提供的模式来匹配数据时,由于嵌套的量词和过多的回溯 ,该模式可能表现出较差的性能(请参阅灾难性的回溯 )。 One way to apply a timeout is to run the regex asynchronously, but this can be tedious and clutters the code. 应用超时的一种方法是异步运行regex,但这可能很乏味并且会使代码混乱。

According to what's new in the .NET Framework 4.5 Developer Preview it looks like there's a new built-in approach to support this: 根据.NET Framework 4.5开发人员预览版中的新增功能,似乎有一种新的内置方法来支持此功能:

Ability to limit how long the regular expression engine will attempt to resolve a regular expression before it times out. 限制正则​​表达式引擎在超时前尝试解析正则表达式的时间。

How can I use this feature? 如何使用此功能? Also, what do I need to be aware of when using it? 另外,使用时需要注意什么?

Note: I'm asking and answering this question since it's encouraged . 注意:我在回答这个问题,因为它受到鼓励

I recently researched this topic since it interested me and will cover the main points here. 我最近研究了这个主题,因为它使我感兴趣,并将在这里介绍要点。 The relevant MSDN documentation is available here and you can check out the Regex class to see the new overloaded constructors and static methods. 有关MSDN文档,请参见此处 ,您可以查看Regex类以查看新的重载构造函数和静态方法。 The code samples can be run with Visual Studio 11 Developer Preview . 可以使用Visual Studio 11开发人员预览版运行代码示例。

The Regex class accepts a TimeSpan to specify the timeout duration. Regex类接受TimeSpan来指定超时时间。 You can specify a timeout at a macro and micro level in your application, and they can be used together: 您可以在应用程序的宏和微观级别指定超时,它们可以一起使用:

  • Set the "REGEX_DEFAULT_MATCH_TIMEOUT" property using the AppDomain.SetData method (macro application-wide scope) 使用AppDomain.SetData方法设置"REGEX_DEFAULT_MATCH_TIMEOUT"属性(整个应用程序范围的宏)
  • Pass the matchTimeout parameter (micro localized scope) 传递matchTimeout参数(微本地化范围)

When the AppDomain property is set, all Regex operations will use that value as the default timeout. 设置AppDomain属性后,所有Regex操作都将使用该值作为默认超时。 To override the application-wide default you simply pass a matchTimeout value to the regex constructor or static method. 要覆盖应用程序范围的默认值,您只需将matchTimeout值传递给regex构造函数或静态方法。 If an AppDomain default isn't set, and matchTimeout isn't specified, then pattern matching will not timeout (ie, original pre-.NET 4.5 behavior). 如果未设置AppDomain默认值,并且未指定matchTimeout ,则模式匹配将不会超时(即,.NET 4.5之前的原始行为)。

There are 2 main exceptions to handle: 有两个主要的例外要处理:

  • RegexMatchTimeoutException : thrown when a timeout occurs. RegexMatchTimeoutException :发生超时时引发。
  • ArgumentOutOfRangeException : thrown when " matchTimeout is negative or greater than approximately 24 days." ArgumentOutOfRangeException :当“ matchTimeout为负或大于大约24天时”抛出。 In addition, a TimeSpan value of zero will cause this to be thrown. 此外, TimeSpan值为零将导致抛出该错误。

Despite negative values not being allowed, there's one exception: a value of -1 ms is accepted. 尽管不允许使用负值,但有一个例外:接受-1 ms的值。 Internally the Regex class accepts -1 ms, which is the value of the Regex.InfiniteMatchTimeout field , to indicate that a match should not timeout (ie, original pre-.NET 4.5 behavior). 在内部, Regex类接受-1 ms,这是Regex.InfiniteMatchTimeout字段的值,以指示匹配不应超时(即,.NET 4.5之前的原始行为)。

Using the matchTimeout parameter 使用matchTimeout参数

In the following example I'll demonstrate both valid and invalid timeout scenarios and how to handle them: 在下面的示例中,我将演示有效和无效的超时情况以及如何处理它们:

string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";
var timeouts = new[]
{
    TimeSpan.FromSeconds(4),     // valid
    TimeSpan.FromSeconds(-10)    // invalid
};

foreach (var matchTimeout in timeouts)
{
    Console.WriteLine("Input: " + matchTimeout);
    try
    {
        bool result = Regex.IsMatch(input, pattern,
                                    RegexOptions.None, matchTimeout);
    }
    catch (RegexMatchTimeoutException ex)
    {
        Console.WriteLine("Match timed out!");
        Console.WriteLine("- Timeout interval specified: " + ex.MatchTimeout);
        Console.WriteLine("- Pattern: " + ex.Pattern);
        Console.WriteLine("- Input: " + ex.Input);
    }
    catch (ArgumentOutOfRangeException ex)
    {
        Console.WriteLine(ex.Message);
    }
    Console.WriteLine();
}

When using an instance of the Regex class you have access to the MatchTimeout property : 使用Regex类的实例时,您可以访问MatchTimeout属性

string input = "The English alphabet has 26 letters";
string pattern = @"\d+";
var matchTimeout = TimeSpan.FromMilliseconds(10);
var sw = Stopwatch.StartNew();
try
{
    var re = new Regex(pattern, RegexOptions.None, matchTimeout);
    bool result = re.IsMatch(input);
    sw.Stop();

    Console.WriteLine("Completed match in: " + sw.Elapsed);
    Console.WriteLine("MatchTimeout specified: " + re.MatchTimeout);
    Console.WriteLine("Matched with {0} to spare!",
                         re.MatchTimeout.Subtract(sw.Elapsed));
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine(ex.Message);
}

Using the AppDomain property 使用AppDomain属性

The "REGEX_DEFAULT_MATCH_TIMEOUT" property is used set an application-wide default: "REGEX_DEFAULT_MATCH_TIMEOUT"属性用于设置应用程序范围的默认值:

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(2));

If this property is set to an invalid TimeSpan value or an invalid object, a TypeInitializationException will be thrown when attempting to use a regex. 如果将此属性设置为无效的TimeSpan值或无效的对象,则在尝试使用正则表达式时将引发TypeInitializationException

Example with a valid property value: 具有有效属性值的示例:

// AppDomain default set somewhere in your application
AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(2));

// regex use elsewhere...
string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";

var sw = Stopwatch.StartNew();
try
{
    // no timeout specified, defaults to AppDomain setting
    bool result = Regex.IsMatch(input, pattern);
    sw.Stop();
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine("Match timed out!");
    Console.WriteLine("Applied Default: " + ex.MatchTimeout);
}
catch (ArgumentOutOfRangeException ex)
{
    sw.Stop();
}
catch (TypeInitializationException ex)
{
    sw.Stop();
    Console.WriteLine("TypeInitializationException: " + ex.Message);
    Console.WriteLine("InnerException: {0} - {1}",
        ex.InnerException.GetType().Name, ex.InnerException.Message);
}
Console.WriteLine("AppDomain Default: {0}",
    AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));
Console.WriteLine("Stopwatch: " + sw.Elapsed);

Using the above example with an invalid (negative) value would cause the exception to be thrown. 将上面的示例与无效(负)值一起使用将导致引发异常。 The code that handles it writes the following message to the console: 处理它的代码将以下消息写入控制台:

TypeInitializationException: The type initializer for 'System.Text.RegularExpressions.Regex' threw an exception. TypeInitializationException:“ System.Text.RegularExpressions.Regex”的类型初始值设定项引发了异常。

InnerException: ArgumentOutOfRangeException - Specified argument was out of the range of valid values. InnerException:ArgumentOutOfRangeException-指定的参数超出有效值范围。 Parameter name: AppDomain data 'REGEX_DEFAULT_MATCH_TIMEOUT' contains an invalid value or object for specifying a default matching timeout for System.Text.RegularExpressions.Regex. 参数名称:AppDomain数据'REGEX_DEFAULT_MATCH_TIMEOUT'包含无效的值或对象,用于为System.Text.RegularExpressions.Regex指定默认的匹配超时。

In both examples the ArgumentOutOfRangeException isn't thrown. 在两个示例中,均未引发ArgumentOutOfRangeException For completeness the code shows all the exceptions you can handle when working with the new .NET 4.5 Regex timeout feature. 为了完整起见,代码显示了使用新的.NET 4.5 Regex超时功能时可以处理的所有异常。

Overriding AppDomain default 覆盖AppDomain默认

Overriding the AppDomain default is done by specifying a matchTimeout value. 通过指定matchTimeout值来覆盖AppDomain默认值。 In the next example the match times out in 2 seconds instead of the default of 5 seconds. 在下一个示例中,匹配将在2秒后超时,而不是默认的5秒。

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(5));

string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";

var sw = Stopwatch.StartNew();
try
{
    var matchTimeout = TimeSpan.FromSeconds(2);
    bool result = Regex.IsMatch(input, pattern,
                                RegexOptions.None, matchTimeout);
    sw.Stop();
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine("Match timed out!");
    Console.WriteLine("Applied Default: " + ex.MatchTimeout);
}

Console.WriteLine("AppDomain Default: {0}",
    AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));
Console.WriteLine("Stopwatch: " + sw.Elapsed);

Closing Remarks 闭幕致辞

MSDN recommends setting a time-out value in all regular expression pattern-matching operations. MSDN建议在所有正则表达式模式匹配操作中设置一个超时值。 However, they don't draw your attention to issues to be aware of when doing so. 但是,这样做并不会引起您注意要注意的问题。 I don't recommend setting an AppDomain default and calling it a day. 我不建议将AppDomain默认设置为一天。 You need to know your input and know your patterns. 您需要知道您的输入并知道您的模式。 If the input is large, or the pattern is complex, an appropriate timeout value should be used. 如果输入较大或模式复杂,则应使用适当的超时值。 This might also entail measuring your critically performing regex usages to assign sane defaults. 这可能还需要测量您的关键执行正则表达式用法以分配合理的默认值。 Arbitrarily assigning a timeout value to a regex that used to work fine may cause it to break if the value isn't long enough. 如果该超时值不够长,则将任意超时值分配给正常工作的正则表达式可能会导致超时。 Measure existing usages before assigning a value if you think it might abort the matching attempt too early. 如果您认为值可能会太早中止匹配尝试,请在分配值之前先测量现有用法。

Moreover, this feature is useful when handling user supplied patterns. 此外,此功能在处理用户提供的图案时很有用。 Yet, learning how to write proper patterns that perform well is important. 但是,学习如何编写性能良好的适当模式很重要。 Slapping a timeout on it to make up for a lack of knowledge in proper pattern construction isn't good practice. 对其进行超时以弥补对正确的模式构造缺乏知识的做法不是一个好习惯。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM