简体   繁体   English

RegEx vs字符串操作函数:什么是更好的

[英]RegEx vs string manipulation functions: What is better

If I have to find let's say a word in a sentence, i can think of two approaches 如果我必须找到一个句子中的单词,我可以想到两种方法

  1. Using string.IndexOf 使用string.IndexOf
  2. Using Regex 使用正则表达式

Which one is better in terms of performance or best practice 在性能或最佳实践方面哪一个更好

If it's fairly straightforward to do something without regex, it's almost always cheaper that way. 如果在没有正则表达式的情况下做一些事情相当简单,那么这种方式几乎总是更便宜。 String.IndexOf (or String.Contains ) is definitely an example of this. String.IndexOf (或String.Contains绝对是一个例子。

It depends on your exact requirements. 这取决于您的确切要求。 If you truly need to find a word in a sentence (not a substring), then I believe that could be expressed more concisely and more explicitly using a well-named regex pattern than using IndexOf plus all the extra logic to make sure you're actually getting a complete single word. 如果你确实需要在一个句子中找到一个单词 (而不是一个子字符串),那么我相信使用一个名字很好的正则表达式模式可以更简洁,更明确地表达,而不是使用IndexOf加上所有额外的逻辑来确保你是实际上得到一个完整的单词。

On the other hand, if you're simply looking for a substring, then IndexOf is far superior in terms of performance and readability. 另一方面,如果您只是在寻找子字符串,那么IndexOf在性能和可读性方面要优越得多。

This is by no means the most scientific way of measuring things but here is a bit of source code that indicates (under very specific constraints) regex is about 4 times slower then indexof. 这绝不是最科学的测量方法,但这里有一些源代码表明(在非常具体的约束下)正则表达式比indexof慢大约4倍。

class Program
{
private const string Sentence = "The quick brown fox jumps over the lazy dog";
private const string Word = "jumps";

static void Main(string[] args)
{
    var indexTimes = new List<long>();
    var regexTimes = new List<long>();
    var timer = new Stopwatch();

    for (int i = 0; i < 1000; i++)
    {
        timer.Reset();
        timer.Start();
        Sentence.IndexOf(Word);
        timer.Stop();
        indexTimes.Add(timer.ElapsedTicks);
    }

    Console.WriteLine(indexTimes.Average());

    for (int i = 0; i < 1000; i++)
    {
        timer.Reset();
        timer.Start();
        Regex.Match(Sentence, Word);
        timer.Stop();
        regexTimes.Add(timer.ElapsedTicks);
    }

    Console.WriteLine(regexTimes.Average());

    Console.ReadLine();
}
}

In terms of best practices, string.IndexOf is probably a little more obvious to someone reading the code. 就最佳实践而言,对于阅读代码的人来说, string.IndexOf可能更为明显。 People's brains tend to close up as soon as they see a regular expression, so something straight-forward like IndexOf would keep their brains open. 一旦他们看到正则表达式,人们的大脑往往会关闭,所以像IndexOf这样直接的东西会让他们的大脑保持开放状态。

As for performance, that's dependent on a lot of things and can only be properly answered through benchmarking of specific code. 至于性能,这取决于很多事情,只能通过特定代码的基准测试来正确回答。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM