简体   繁体   English

C# 中的 String.Join 性能问题

[英]String.Join performance issue in C#

I've been researching a question that was presented to me: How to write a function that takes a string as input and returns a string with spaces between the characters.我一直在研究向我提出的一个问题:如何编写一个 function,它将一个字符串作为输入并返回一个字符之间有空格的字符串。 The function is to be written to optimize performance when it is called thousands of times per second. function 是为了在每秒调用数千次时优化性能。

  1. I know that .net has a function called String.Join , to which I may pass in the space character as a separator along with the original string.我知道 .net 有一个名为String.Join的 function ,我可以将空格字符作为分隔符与原始字符串一起传递给它。

  2. Barring the use of String.Join , I can use the StringBuilder class to append spaces after each character.除非使用String.Join ,否则我可以在每个字符后使用StringBuilder class 到 append 个空格。

  3. Another way to accomplish this task is to declare a character array with 2*n-1 characters (You have to add n-1 characters for the spaces).完成此任务的另一种方法是声明一个包含 2*n-1 个字符的字符数组(您必须为空格添加 n-1 个字符)。 The character array can be filled in a loop and then passed to the String constructor .可以在一个循环中填充字符数组,然后将其传递给 String constructor

I've written some .net code that runs each of these algorithms one millions times each with the parameter "Hello, World" and measures how long it takes to execute.我编写了一些 .net 代码,这些代码使用参数"Hello, World"运行每个算法一百万次,并测量执行所需的时间。 Method (3) is much, much faster than (1) or (2).方法 (3) 比 (1) 或 (2) 快得多。

I know that (3) should be very fast because it avoids creating any additional string references to be garbage collected, but it seems to me that a built-in .net function such as String.Join should yield good performance.我知道 (3) 应该非常快,因为它避免创建任何额外的字符串引用以进行垃圾收集,但在我看来,内置的 .net function(例如String.Join应该会产生良好的性能。 Why is using String.Join so much slower than doing the work by hand?为什么使用String.Join比手动完成工作慢得多?

public static class TestClass
{
    // 491 milliseconds for 1 million iterations
    public static string Space1(string s) 
    {            
        return string.Join(" ", s.AsEnumerable());
    }

    //190 milliseconds for 1 million iterations
    public static string Space2(string s) 
    {
        if (s.Length < 2)
            return s;
        StringBuilder sb = new StringBuilder();
        sb.Append(s[0]);
        for (int i = 1; i < s.Length; i++)
        {
            sb.Append(' ');
            sb.Append(s[i]);
        }            
        return sb.ToString();
    }

    // 50 milliseconds for 1 million iterations
    public static string Space3(string s) 
    {
        if (s.Length < 2)
            return s;
        char[] array = new char[s.Length * 2 - 1];
        array[0] = s[0];
        for (int i = 1; i < s.Length; i++)
        {
            array[2*i-1] = ' ';
            array[2*i] = s[i];
        }
        return new string(array);
    }

Update: I have changed my project to "Release" mode and updated my elapsed times in the question accordingly.更新:我已将我的项目更改为“发布”模式,并相应地更新了我在问题中的运行时间。

Why is using String.Join so much slower than doing the work by hand?为什么使用 String.Join 比手动完成工作慢得多?

The reason String.Join is slower in this case is that you can write an algorithm that has prior knowledge of the exact nature of your IEnumerable<T> . String.Join在这种情况下变慢的原因是您可以编写一个算法,该算法事先了解您的IEnumerable<T>的确切性质。

String.Join<T>(string, IEnumerable<T>) (the overload you're using), on the other hand, is intended to work with any arbitrary enumerable type, which means it cannot pre-allocate to the proper size. String.Join<T>(string, IEnumerable<T>) (您正在使用的重载)旨在与任意可枚举类型一起使用,这意味着它无法预先分配到适当的大小。 In this case, it's trading flexibility for pure performance and speed.在这种情况下,它是以灵活性换取纯粹的性能和速度。

Many of the framework methods do handle certain cases where things could be sped up by checking for conditions, but this typically is only done when that "special case" is going to be common.许多框架方法确实处理某些情况,在这些情况下可以通过检查条件来加快速度,但这通常只在“特殊情况”变得普遍时才这样做。

In this case, you're effectively creating an edge case where a hand-written routine will be faster, but it is not a common use case of String.Join .在这种情况下,您实际上是在创建一种边缘情况,在这种情况下,手写例程会更快,但这不是String.Join的常见用例。 In this case, since you know, exactly, in advance what is required, you have the ability to avoid all of the overhead required to have a flexible design by pre-allocating an array of exactly the right size, and building the results manually.在这种情况下,由于您事先确切地知道需要什么,因此您可以通过预先分配大小恰到好处的数组并手动构建结果来避免灵活设计所需的所有开销。

You'll find that, in general, it's often possible to write a method that will out perform some of the framework routines for specific input data .您会发现,一般来说,通常可以编写一种方法来执行特定输入数据的某些框架例程。 This is common, as the framework routines have to work with any dataset, which means that you can't optimize for a specific input scenario.这很常见,因为框架例程必须适用于任何数据集,这意味着您无法针对特定输入场景进行优化。

Your String.Join example works on an IEnumerable<char> .您的String.Join示例适用于IEnumerable<char> Enumerating an IEnumerable<T> with foreach is often slower than executing a for loop (it depends on the the collection type and other circumstances, as Dave Black pointed out in a comment).foreach枚举IEnumerable<T>通常比执行for循环慢(这取决于集合类型和其他情况,正如 Dave Black 在评论中指出的那样)。 Even if Join uses a StringBuilder , the internal buffer of the StringBuilder will have to be increased several times, since the number of items to append is not known in advance.即使Join使用StringBuilderStringBuilder的内部缓冲区也必须增加数倍,因为事先不知道 append 的项目数。

Since you aren't using the Release build (which should have optimizations checked by default) and/or you're debugging through visual studio then the JITer will be prevented from making a lot of it's optimizations.由于您没有使用发布版本(默认情况下应该检查优化)和/或您正在通过 visual studio 进行调试,因此 JITer 将无法进行大量优化。 Because of this you're just not getting a good picture of how long each operation really takes.因此,您无法很好地了解每个操作实际需要多长时间。 Once you add in the optimizations you can get the real picture of what's going on.添加优化后,您可以了解正在发生的事情的真实情况。

It's also important that you not be debugging in visual studio.不要在 visual studio 中调试也很重要。 Go to the bin/release folder and double click the executable entirely outside of visual studio. Go 到 bin/release 文件夹并双击完全在 visual studio 之外的可执行文件。

In your first method, you are using the overload of String.Join that operates on an Enumerable, which requires that the method walk the characters of the string using an enumerator.在您的第一个方法中,您正在使用对 Enumerable 进行操作的String.Join的重载,这要求该方法使用枚举器遍历字符串的字符。 Internally, this uses a StringBuilder as the exact number of characters is unknown.在内部,这使用StringBuilder因为确切的字符数是未知的。

Have you considered using the String.Join overload that takes a string (or string array) instead?您是否考虑过使用采用字符串(或字符串数组)的String.Join重载? That implementation allows a fixed length buffer to be used (similar to your third method) along with some internal unsafe string operations for speed.该实现允许使用固定长度的缓冲区(类似于您的第三种方法)以及一些内部不安全的字符串操作以提高速度。 The call would change to - String.Join(" ", s);调用将更改为 - String.Join(" ", s); Without actually doing the legwork to measure, I would expect this to be on par or faster than your third approach.在不实际进行跑腿测量的情况下,我希望这与您的第三种方法相当或更快。

The bad performance is not coming from String.Join , but from the way you handle each character.糟糕的表现不是来自String.Join ,而是来自您处理每个字符的方式。 In this case, since characters have to be handled individually, your first method will create much more intermediate strings and the second method suffers from two .Append method calls for each character.在这种情况下,由于必须单独处理字符,您的第一个方法将创建更多的中间字符串,而第二个方法会为每个字符调用两次.Append方法。 Your third method does not involve a lots of intermediate strings or methods calls and that's the reason why your third method is the fastest.您的第三种方法不涉及大量中间字符串或方法调用,这就是您的第三种方法最快的原因。

When you have passed an IEnumerable to String.Join , it has no idea on how much memory needs to be allocated.当您将IEnumerable传递给String.Join时,它不知道需要分配多少 memory。 I allocates a chunk of memory, resizes it if it is insufficient and repeats the process until it gets enough memory to accommodate all the strings.我分配了一块 memory,如果不足则调整它的大小并重复该过程,直到它获得足够的 memory 来容纳所有字符串。

The array version is faster because we know the amount of memory allocated well ahead.数组版本更快,因为我们知道提前分配的 memory 的数量。

Also please not that when you are running the 1st version, GC might have occurred.另外请注意,当您运行第一个版本时,GC 可能已经发生。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM