简体   繁体   English

RegEx.IsMatch()与String.ToUpper()。包含()性能

[英]RegEx.IsMatch() vs. String.ToUpper().Contains() performance

Since there is no case insensitive string.Contains() (yet a case insensitive version of string.Equals() exists which baffles me, but I digress) in .NET, What is the performance differences between using RegEx.IsMatch() vs. using String.ToUpper().Contains() ? 由于没有不区分大小写的string.Contains() (但是存在一个不区分大小写的string.Equals()版本让我感到困惑,但我离题了),使用RegEx.IsMatch()与使用RegEx.IsMatch()之间的性能差异是什么?使用String.ToUpper().Contains()

Example: 例:

string testString = "tHiSISaSTRINGwiThInconSISteNTcaPITaLIZATion";

bool containsString = RegEx.IsMatch(testString, "string", RegexOptions.IgnoreCase);
bool containsStringRegEx = testString.ToUpper().Contains("STRING");

I've always heard that string.ToUpper() is a very expensive call so I shy away from using it when I want to do string.Contains() comparisons, but how does RegEx.IsMatch() compare in terms of performance? 我总是听说string.ToUpper()是一个非常昂贵的调用,所以当我想做string.Contains()比较时,我回避使用它,但RegEx.IsMatch()在性能方面有何比较?

Is there a more efficient approach for doing such comparisons? 是否有更有效的方法进行此类比较?

Here's a benchmark 这是一个基准

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();

        string testString = "tHiSISaSTRINGwiThInconSISteNTcaPITaLIZATion";

        sw.Start();
        var re = new Regex("string", RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
        for (int i = 0; i < 1000000; i++)
        {
            bool containsString = re.IsMatch(testString);
        }
        sw.Stop();
        Console.WriteLine("RX: " + sw.ElapsedMilliseconds);

        sw.Restart();
        for (int i = 0; i < 1000000; i++)
        {
            bool containsStringRegEx = testString.ToUpper().Contains("STRING");
        }


        sw.Stop();
        Console.WriteLine("Contains: " + sw.ElapsedMilliseconds);

        sw.Restart();
        for (int i = 0; i < 1000000; i++)
        {
            bool containsStringRegEx = testString.IndexOf("STRING", StringComparison.OrdinalIgnoreCase) >= 0 ;
        }


        sw.Stop();
        Console.WriteLine("IndexOf: " + sw.ElapsedMilliseconds);
    }
}

Results were 结果是

IndexOf (183ms) > Contains (400ms) > Regex (477ms) IndexOf(183ms)>包含(400ms)>正则表达式(477ms)

(Updated output times using the compiled Regex) (使用编译的正则表达式更新输出时间)

There is another version using String.IndexOf(String,StringComparison) that might be more efficient than either of the two you suggested: 使用String.IndexOf(String,StringComparison)另一个版本可能比您建议的两个版本中的任何一个更有效:

string testString = "tHiSISaSTRINGwiThInconSISteNTcaPITaLIZATion";
bool contained = testString.IndexOf("string", StringComparison.OrdinalIgnoreCase) >= 0;

If you need a culture-sensitive comparison, use CurrentCultureIgnoreCase instead of OrdinalIgnoreCase . 如果需要对文化敏感的比较,请使用CurrentCultureIgnoreCase而不是OrdinalIgnoreCase

I would expect RegEx.match to be slow based on personal experience with regular expression parsers in general. 基于普通表达式解析器的个人经验,我希望RegEx.match会很慢。 But as many folks have mentioned, profiling it is the best way to find out for sure. 但正如许多人提到的那样,分析它是确定的最好方法。 I've had to fix performance issues related to regular expression parsers, toLower and toUpper have never come back to bite me. 我必须解决与正则表达式解析器相关的性能问题, toLowertoUpper从来没有回过头来咬我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM