简体   繁体   中英

Why C# string interpolation slower than regular string concat?

I am optimizing our debug print facilities (class). The class is roughly straightforward, with a global "enabled" bool and a PrineDebug routine.

I'm investigating the performance of the PrintDebug method in "disabled" mode, trying to create a framework with less impact on run time if no debug prints are needed.

During the exploration I came across the below results, which were a surprise to me and I wonder what am I missing here?

public class Profiler
{
     private bool isDebug = false;

     public void PrineDebug(string message)
     {
         if (isDebug)
         {
             Console.WriteLine(message);
         }
     }
}

[MemoryDiagnoser]
public class ProfilerBench
{
    private Profiler profiler = new Profiler();
    private int five = 5;
    private int six = 6;

    [Benchmark]
    public void DebugPrintConcat()
    {
        profiler.PrineDebug("sometext_" + five + "_" + six);
    }

    [Benchmark]
    public void DebugPrintInterpolated()
    {
        profiler.PrineDebug($"sometext_{five}_{six}");
    }
}

Running this benchmark under BenchmarkDotNet.. Here are the results:

|                 Method |     Mean |   Error |  StdDev |  Gen 0 | Allocated |
|----------------------- |---------:|--------:|--------:|-------:|----------:|
|       DebugPrintConcat | 149.0 ns | 3.02 ns | 6.03 ns | 0.0136 |      72 B |
| DebugPrintInterpolated | 219.4 ns | 4.13 ns | 6.18 ns | 0.0181 |      96 B |

I thought the Concat approach will be slower as every + operation actually creates a new string (+allocation), but seems the interpolation caused higher allocation with higher time.

Can you explain?

TLDR: Interpolated strings are overall the best and they only allocate more memory in your benchmarks because you are using old .Net and cached number strings

There's a lot to talk about here.

First off, a lot of people think string concatenation using + will always create a new string for every + . That might be the case in a loop, but if you use lots of + one after another, the compiler will actually replace those operators with a call to one string.Concat , making the complexity O(n), not O(n^2). Your DebugPrintConcat actually compiles to this:

public void DebugPrintConcat()
{
    profiler.PrineDebug(string.Concat("sometext_", five.ToString(), "_", six.ToString()));
}

It should be noted that in your specific case, you are not benchmarking string allocation for the integers because .Net caches string instances for small numbers , so those .ToString() on five and six end up allocating nothing. The memory allocation would've been much different if you used bigger numbers or formatting (like .ToString("10:0000") ).

The three ways of concating strings are + (that is, string.Concat() ), string.Format() and interpolated strings. Interpolated strings used to be the exact same as string.Format() , as $"..." was just syntactic sugar for string.Format() , but that is not the case anymore since .Net 6 when they got a redesign via Interpolated String Handlers

Another myth I think I have to address is that people think that using string.Format() on structs will always lead to first boxing the struct, then creating an intermediate string by calling .ToString() on the boxed struct. That is false, for years now, all primitive types have implemented ISpanFormattable which allowed string.Format() to skip creating an intermediate string and write the string representation of the object directly into the internal buffer . ISpanFormattalbe has gone public with the release of .Net 6 so you can implement it for your own types, too (more on that at the end of this answer)

About memory characteristics of each approach, ordered from worst to best:

  • string.Concat() (the overloads accepting objects, not strings) is the worst because it will always box structs and create intermediate strings (source: decompilation using ILSpy )
  • + and string.Concat() (the overloads accepting strings, not objects) are slightly better than the previous, because while they do use intermediate strings, they don't box structs
  • string.Format() is generally better than previous because as mentioned earlier it does need to box structs, but not make an intermediate string if the structs implement ISpanFormattable (which was internal to .Net until not too long ago, but the performance benefit was there nevertheless). Furthermore, it is much more likely string.Format() won't need to allocate an object[] compared to previous methods
  • Interpolated strings are the best because with the release of .Net 6, they don't box structs , and they don't create intermediate strings for types implementing ISpanFormattable . The only allocation you will generally get with them is just the returned string and nothing else.

To support the claims above, I'm adding a benchmark class and benchmark results below, making sure to avoid the situation in the original post where + performs best only because strings are cached for small ints:

[MemoryDiagnoser]
[RankColumn]
public class ProfilerBench
{
    private float pi = MathF.PI;
    private double e = Math.E;
    private int largeInt = 116521345;

    [Benchmark(Baseline = true)]
    public string StringPlus()
    {
        return "sometext_" + pi + "_" + e + "_" + largeInt + "...";
    }

    [Benchmark]
    public string StringConcatStrings()
    {
        // the string[] overload
        // the exact same as StringPlus()
        return string.Concat("sometext_", pi.ToString(), "_", e.ToString(), "_", largeInt.ToString(), "...");
    }

    [Benchmark]
    public string StringConcatObjects()
    {
        // the params object[] overload
        return string.Concat("sometext_", pi, "_", e, "_", largeInt, "...");
    }

    [Benchmark]
    public string StringFormat()
    {
        // the (format, object, object, object) overload
        // note that the methods above had to allocate an array unlike string.Format()
        return string.Format("sometext_{0}_{1}_{2}...", pi, e, largeInt);
    }

    [Benchmark]
    public string InterpolatedString()
    {
        return $"sometext_{pi}_{e}_{largeInt}...";
    }
}

Results are ordered by bytes allocated:

Method Mean Error StdDev Rank Gen 0 Allocated
StringConcatObjects 293.9 ns 1.66 ns 1.47 ns 4 0.0386 488 B
StringPlus 266.8 ns 2.04 ns 1.91 ns 2 0.0267 336 B
StringConcatStrings 278.7 ns 2.14 ns 1.78 ns 3 0.0267 336 B
StringFormat 275.7 ns 1.46 ns 1.36 ns 3 0.0153 192 B
InterpolatedString 249.0 ns 1.44 ns 1.35 ns 1 0.0095 120 B

If I edit the benchmark class to use more than three format arguments, then the difference between InterpolatedString and string.Format() will be even greater because of the array allocation:

[MemoryDiagnoser]
[RankColumn]
public class ProfilerBench
{
    private float pi = MathF.PI;
    private double e = Math.E;
    private int largeInt = 116521345;
    private float anotherNumber = 0.123456789f;

    [Benchmark]
    public string StringPlus()
    {
        return "sometext_" + pi + "_" + e + "_" + largeInt + "..." + anotherNumber;
    }

    [Benchmark]
    public string StringConcatStrings()
    {
        // the string[] overload
        // the exact same as StringPlus()
        return string.Concat("sometext_", pi.ToString(), "_", e.ToString(), "_", largeInt.ToString(), "...", anotherNumber.ToString());
    }

    [Benchmark]
    public string StringConcatObjects()
    {
        // the params object[] overload
        return string.Concat("sometext_", pi, "_", e, "_", largeInt, "...", anotherNumber);
    }

    [Benchmark]
    public string StringFormat()
    {
        // the (format, object[]) overload
        return string.Format("sometext_{0}_{1}_{2}...{3}", pi, e, largeInt, anotherNumber);
    }

    [Benchmark]
    public string InterpolatedString()
    {
        return $"sometext_{pi}_{e}_{largeInt}...{anotherNumber}";
    }
}

Benchmark results, again ordered by bytes allocated:

Method Mean Error StdDev Rank Gen 0 Allocated
StringConcatObjects 389.3 ns 2.65 ns 2.34 ns 4 0.0477 600 B
StringPlus 350.7 ns 1.88 ns 1.67 ns 2 0.0329 416 B
StringConcatStrings 374.4 ns 6.90 ns 6.46 ns 3 0.0329 416 B
StringFormat 390.4 ns 2.01 ns 1.88 ns 4 0.0234 296 B
InterpolatedString 332.6 ns 2.82 ns 2.35 ns 1 0.0114 144 B

EDIT: People might still think calling .ToString() on interpolated string handler arguments is a good idea. It is not , the performance will suffer if you do it and Visual Studio even kind of warns you not to do it. This is not something that only applies to .net6 , below you can see that even when using string.Format() , which interpolated string used to be syntactic sugar for, it is still bad to call .ToString() :

[MemoryDiagnoser]
[RankColumn]
public class ProfilerBench
{
    private float pi = MathF.PI;
    private double e = Math.E;
    private int largeInt = 116521345;
    private float anotherNumber = 0.123456789f;

    [Benchmark]
    public string StringFormatGood()
    {
        // the (format, object[]) overload with boxing structs
        return string.Format("sometext_{0}_{1}_{2}...{3}", pi, e, largeInt, anotherNumber);
    }

    [Benchmark]
    public string StringFormatBad()
    {
        // the (format, object[]) overload with pre-converting the structs to strings
        return string.Format("sometext_{0}_{1}_{2}...{3}", 
            pi.ToString(), 
            e.ToString(), 
            largeInt.ToString(), 
            anotherNumber.ToString());
    }
}
Method Mean Error StdDev Rank Gen 0 Allocated
StringFormatGood 389.0 ns 2.27 ns 2.12 ns 1 0.0234 296 B
StringFormatBad 442.0 ns 4.62 ns 4.09 ns 2 0.0305 384 B

The explanation for the results is that it is cheaper to box the struct and have string.Format() write the string representations directly into it's char buffer, rather than creating an intermediate string explicitly and forcing string.Format() to copy from it.

If you want to read more about how interpolated string handlers work and how to make your own types implement ISpanFormattable , this is a good reading: link

String concatenation is faster and lighter when you have to concatenate less than 4-8 (not accurate though) strings, but as the number of strings which has to be concatenated grows, it's better to use StringBuilderStringBuilder .

Internally, when you concatenate strings using '+' operator like

string test = "foo" + a + "bar";

the concat() method is called for every concatenation.

string t1 = "foo";
string t2 = a;
string t3 = concat(t1, t2);
string t4 = "bar";
string final = concat(t3, t4);

In the latter case for string interpolation, which is nothing but a syntactic sugar for String.Format() .

So, when you use string interpolation,

string text = $"Foo{a}Bar";

it would be converted to,

string text = string.Format("Foo{0}Bar", new object[] { a });

You can find performance implications of these methods in the MSDN , however, the performance factor is pretty much negligible in small scale string building, but for serious string building, it's much better to use a StringBuilder rather than interpolation and raw concatenation .

Performance cannot be the only factor while picking an approach, because interpolation internally calls string.Format() which allows you to format your string (padding, decimal precision, date formatting, etc) offering you much more flexibility.

Concatenation, Formatting and String building has their own use cases, it is up to you to decide which one suits your need the best.

I believe that problem here is just a boxing of int s. I tried to eliminate the boxing and got the same performance as for concatenation

Method Mean Error StdDev Gen 0 Allocated
DebugPrintConcat 41.49 ns 0.198 ns 0.185 ns 0.0046 48 B
DebugPrintInterpolated 103.07 ns 0.257 ns 0.227 ns 0.0092 96 B
DebugPrintInterpolatedStrings 41.36 ns 0.211 ns 0.198 ns 0.0046 48 B

DebugPrintInterpolatedStrings code: I just added explicit ToString

    [Benchmark]
    public void DebugPrintInterpolatedStrings()
    {
        profiler.PrineDebug($"sometext_{five.ToString()}_{six.ToString()}");
    }

We can also note the reduced allocations (exactly because of absence of additional boxed objects).

PS. By the way, @GSerg already mentioned post with the same explanation in the comment.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM