简体   繁体   English

包含比 StartsWith 快?

[英]Contains is faster than StartsWith?

A consultant came by yesterday and somehow the topic of strings came up.一位顾问昨天来了,不知何故出现了字符串的话题。 He mentioned that he had noticed that for strings less than a certain length, Contains is actually faster than StartsWith .他提到他注意到对于小于一定长度的字符串, Contains实际上比StartsWith快。 I had to see it with my own two eyes, so I wrote a little app and sure enough, Contains is faster!我不得不用自己的两只眼睛看到它,所以我写了一个小应用程序,果然, Contains更快!

How is this possible?这怎么可能?

DateTime start = DateTime.MinValue;
DateTime end = DateTime.MinValue;
string str = "Hello there";

start = DateTime.Now;
for (int i = 0; i < 10000000; i++)
{
    str.Contains("H");
}
end = DateTime.Now;
Console.WriteLine("{0}ms using Contains", end.Subtract(start).Milliseconds);

start = DateTime.Now;
for (int i = 0; i < 10000000; i++)
{
    str.StartsWith("H");
}
end = DateTime.Now;
Console.WriteLine("{0}ms using StartsWith", end.Subtract(start).Milliseconds);

Outputs:输出:

726ms using Contains 
865ms using StartsWith

I've tried it with longer strings too!我也试过用更长的字符串!

Try using StopWatch to measure the speed instead of DateTime checking.尝试使用StopWatch来测量速度而不是DateTime检查。

Stopwatch vs. using System.DateTime.Now for timing events 秒表与使用 System.DateTime.Now 计时事件

I think the key is the following the important parts bolded:我认为关键是以下重要部分加粗:

Contains : Contains

This method performs an ordinal (case-sensitive and culture-insensitive ) comparison.此方法执行序数(区分大小写和不区分区域性)比较。

StartsWith : StartsWith

This method performs a word (case-sensitive and culture-sensitive ) comparison using the current culture.此方法使用当前区域性执行单词(区分大小写和区分区域性)比较。

I think the key is the ordinal comparison which amounts to:我认为关键是顺序比较,它相当于:

An ordinal sort compares strings based on the numeric value of each Char object in the string.序数排序根据字符串中每个 Char 对象的数值比较字符串。 An ordinal comparison is automatically case-sensitive because the lowercase and uppercase versions of a character have different code points.序数比较自动区分大小写,因为字符的小写和大写版本具有不同的代码点。 However, if case is not important in your application, you can specify an ordinal comparison that ignores case.但是,如果大小写在您的应用程序中不重要,您可以指定忽略大小写的序数比较。 This is equivalent to converting the string to uppercase using the invariant culture and then performing an ordinal comparison on the result.这相当于使用不变区域性将字符串转换为大写,然后对结果执行序数比较。

References:参考:

http://msdn.microsoft.com/en-us/library/system.string.aspx http://msdn.microsoft.com/en-us/library/system.string.aspx

http://msdn.microsoft.com/en-us/library/dy85x1sa.aspx http://msdn.microsoft.com/en-us/library/dy85x1sa.aspx

http://msdn.microsoft.com/en-us/library/baketfxw.aspx http://msdn.microsoft.com/en-us/library/baketfxw.aspx

Using Reflector you can see the code for the two:使用 Reflector 你可以看到两者的代码:

public bool Contains(string value)
{
    return (this.IndexOf(value, StringComparison.Ordinal) >= 0);
}

public bool StartsWith(string value, bool ignoreCase, CultureInfo culture)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    if (this == value)
    {
        return true;
    }
    CultureInfo info = (culture == null) ? CultureInfo.CurrentCulture : culture;
    return info.CompareInfo.IsPrefix(this, value,
        ignoreCase ? CompareOptions.IgnoreCase : CompareOptions.None);
}

I figured it out.我想到了。 It's because StartsWith is culture-sensitive, while Contains is not.这是因为StartsWith是文化敏感的,而Contains不是。 That inherently means StartsWith has to do more work.这本质上意味着StartsWith必须做更多的工作。

FWIW, here are my results on Mono with the below (corrected) benchmark: FWIW,这是我在 Mono 上的结果,具有以下(更正的)基准:

1988.7906ms using Contains
10174.1019ms using StartsWith

I'd be glad to see people's results on MS, but my main point is that correctly done (and assuming similar optimizations), I think StartsWith has to be slower:我很高兴看到人们在 MS 上的结果,但我的主要观点是正确完成(并假设类似的优化),我认为StartsWith必须更慢:

using System;
using System.Diagnostics;

public class ContainsStartsWith
{
    public static void Main()
    {
        string str = "Hello there";

        Stopwatch s = new Stopwatch();
        s.Start();
        for (int i = 0; i < 10000000; i++)
        {
            str.Contains("H");
        }
        s.Stop();
        Console.WriteLine("{0}ms using Contains", s.Elapsed.TotalMilliseconds);

        s.Reset();
        s.Start();
        for (int i = 0; i < 10000000; i++)
        {
            str.StartsWith("H");
        }
        s.Stop();
        Console.WriteLine("{0}ms using StartsWith", s.Elapsed.TotalMilliseconds);

    }
}

StartsWith and Contains behave completely different when it comes to culture-sensitive issues.在涉及文化敏感问题时, StartsWithContains行为完全不同。

In particular, StartsWith returning true does NOT imply Contains returning true .特别是, StartsWith返回true并不意味着Contains返回true You should replace one of them with the other only if you really know what you are doing.只有当您真正知道自己在做什么时,才应将其中一个替换为另一个。

using System;

class Program
{
    static void Main()
    {
        var x = "A";
        var y = "A\u0640";

        Console.WriteLine(x.StartsWith(y)); // True
        Console.WriteLine(x.Contains(y)); // False
    }
}

I twiddled around in Reflector and found a potential answer:我在 Reflector 中闲逛并找到了一个潜在的答案:

Contains:包含:

return (this.IndexOf(value, StringComparison.Ordinal) >= 0);

StartsWith:以。。开始:

...
    switch (comparisonType)
    {
        case StringComparison.CurrentCulture:
            return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);

        case StringComparison.CurrentCultureIgnoreCase:
            return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);

        case StringComparison.InvariantCulture:
            return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);

        case StringComparison.InvariantCultureIgnoreCase:
            return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);

        case StringComparison.Ordinal:
            return ((this.Length >= value.Length) && (nativeCompareOrdinalEx(this, 0, value, 0, value.Length) == 0));

        case StringComparison.OrdinalIgnoreCase:
            return ((this.Length >= value.Length) && (TextInfo.CompareOrdinalIgnoreCaseEx(this, 0, value, 0, value.Length, value.Length) == 0));
    }
    throw new ArgumentException(Environment.GetResourceString("NotSupported_StringComparison"), "comparisonType");

And there are some overloads so that the default culture is CurrentCulture.并且有一些重载,因此默认文化是 CurrentCulture。

So first of all, Ordinal will be faster (if the string is close to the beginning) anyway, right?所以首先,无论如何,Ordinal 会更快(如果字符串接近开头),对吗? And secondly, there's more logic here which could slow things down (although so so trivial)其次,这里有更多的逻辑可以减慢速度(尽管如此微不足道)

Here is a benchmark of using StartsWith vs Contains.这是使用 StartsWith 与 Contains 的基准。 As you can see, StartsWith using ordinal comparison is pretty good, and you should take note of the memory allocated for each method.如您所见,使用序数比较的 StartsWith 非常好,您应该注意为每个方法分配的内存。

|                                   Method |         Mean |      Error |       StdDev |       Median |     Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------------------------------- |-------------:|-----------:|-------------:|-------------:|----------:|------:|------:|----------:|
|                         EnumEqualsMethod |  1,079.67 us |  43.707 us |   114.373 us |  1,059.98 us | 1019.5313 |     - |     - | 4800000 B |
|                             EnumEqualsOp |     28.15 us |   0.533 us |     0.547 us |     28.34 us |         - |     - |     - |         - |
|                             ContainsName |  1,572.15 us | 152.347 us |   449.198 us |  1,639.93 us |         - |     - |     - |         - |
|                        ContainsShortName |  1,771.03 us | 103.982 us |   306.592 us |  1,749.32 us |         - |     - |     - |         - |
|                           StartsWithName | 14,511.94 us | 764.825 us | 2,255.103 us | 14,592.07 us |         - |     - |     - |         - |
|                StartsWithNameOrdinalComp |  1,147.03 us |  32.467 us |    93.674 us |  1,153.34 us |         - |     - |     - |         - |
|      StartsWithNameOrdinalCompIgnoreCase |  1,519.30 us | 134.951 us |   397.907 us |  1,264.27 us |         - |     - |     - |         - |
|                      StartsWithShortName |  7,140.82 us |  61.513 us |    51.366 us |  7,133.75 us |         - |     - |     - |       4 B |
|           StartsWithShortNameOrdinalComp |    970.83 us |  68.742 us |   202.686 us |  1,019.14 us |         - |     - |     - |         - |
| StartsWithShortNameOrdinalCompIgnoreCase |    802.22 us |  15.975 us |    32.270 us |    792.46 us |         - |     - |     - |         - |
|      EqualsSubstringOrdinalCompShortName |  4,578.37 us |  91.567 us |   231.402 us |  4,588.09 us |  679.6875 |     - |     - | 3200000 B |
|             EqualsOpShortNametoCharArray |  1,937.55 us |  53.821 us |   145.508 us |  1,901.96 us | 1695.3125 |     - |     - | 8000000 B |

Here is my benchmark code https://gist.github.com/KieranMcCormick/b306c8493084dfc953881a68e0e6d55b这是我的基准代码https://gist.github.com/KieranMcCormick/b306c8493084dfc953881a68e0e6d55b

Let's examine what ILSpy says about these two...让我们来看看 ILSpy 怎么说这两个......

public virtual int IndexOf(string source, string value, int startIndex, int count, CompareOptions options)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    if (startIndex > source.Length)
    {
        throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_Index"));
    }
    if (source.Length == 0)
    {
        if (value.Length == 0)
        {
            return 0;
        }
        return -1;
    }
    else
    {
        if (startIndex < 0)
        {
            throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_Index"));
        }
        if (count < 0 || startIndex > source.Length - count)
        {
            throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_Count"));
        }
        if (options == CompareOptions.OrdinalIgnoreCase)
        {
            return source.IndexOf(value, startIndex, count, StringComparison.OrdinalIgnoreCase);
        }
        if ((options & ~(CompareOptions.IgnoreCase | CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreSymbols | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth)) != CompareOptions.None && options != CompareOptions.Ordinal)
        {
            throw new ArgumentException(Environment.GetResourceString("Argument_InvalidFlag"), "options");
        }
        return CompareInfo.InternalFindNLSStringEx(this.m_dataHandle, this.m_handleOrigin, this.m_sortName, CompareInfo.GetNativeCompareFlags(options) | 4194304 | ((source.IsAscii() && value.IsAscii()) ? 536870912 : 0), source, count, startIndex, value, value.Length);
    }
}

Looks like it considers culture as well, but is defaulted.看起来它也考虑了文化,但被默认了。

public bool StartsWith(string value, StringComparison comparisonType)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    if (comparisonType < StringComparison.CurrentCulture || comparisonType > StringComparison.OrdinalIgnoreCase)
    {
        throw new ArgumentException(Environment.GetResourceString("NotSupported_StringComparison"), "comparisonType");
    }
    if (this == value)
    {
        return true;
    }
    if (value.Length == 0)
    {
        return true;
    }
    switch (comparisonType)
    {
    case StringComparison.CurrentCulture:
        return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);
    case StringComparison.CurrentCultureIgnoreCase:
        return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);
    case StringComparison.InvariantCulture:
        return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);
    case StringComparison.InvariantCultureIgnoreCase:
        return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);
    case StringComparison.Ordinal:
        return this.Length >= value.Length && string.nativeCompareOrdinalEx(this, 0, value, 0, value.Length) == 0;
    case StringComparison.OrdinalIgnoreCase:
        return this.Length >= value.Length && TextInfo.CompareOrdinalIgnoreCaseEx(this, 0, value, 0, value.Length, value.Length) == 0;
    default:
        throw new ArgumentException(Environment.GetResourceString("NotSupported_StringComparison"), "comparisonType");
    }

By contrast, the only difference I see that appears relevant is an extra length check.相比之下,我认为似乎相关的唯一区别是额外的长度检查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM