简体   繁体   English

在c#中添加字符串,编译器如何做到这一点?

[英]addition of strings in c#, how the compiler does it?

A = string.Concat("abc","def") 

B = "abc" + "def"

A vs. B A与B.

Lately I have been confused why many would say that definitely A does a much faster processing compared to B. But, the thing is they would just say because somebody said so or because it is just the way it is. 最近我很困惑为什么很多人会说,与B相比,A的处理速度要快得多。但是,他们只会说,因为有人这么说或者因为它就是这样。 I suppose I can hear a much better explaination from here. 我想我可以从这里听到更好的解释。

How does the compiler treats these strings? 编译器如何处理这些字符串?

Thank you! 谢谢!

The very first thing I did when I joined the C# compiler team was I rewrote the optimizer for string concatenations. 当我加入C#编译器团队时,我做的第一件事就是重写了字符串连接的优化器。 Good times. 美好的时光。

As already noted, string concats of constant strings are done at compile time. 如前所述,常量字符串的字符串连接在编译时完成。 Non-constant strings do some fancy stuff: 非常量字符串做一些奇特的东西:

a + b --> String.Concat(a, b)
a + b + c --> String.Concat(a, b, c)
a + b + c + d --> String.Concat(a, b, c, d)
a + b + c + d + e --> String.Concat(new String[] { a, b, c, d, e })

The benefits of these optimizations are that the String.Concat method can look at all the arguments, determine the sum of their lengths, and then make one big string that can hold all the results. 这些优化的好处是String.Concat方法可以查看所有参数,确定它们的长度之和,然后创建一个可以容纳所有结果的大字符串。

Here's an interesting one. 这是一个有趣的。 Suppose you have a method M that returns a string: 假设您有一个返回字符串的方法M:

s = M() + "";

If M() returns null then the result is the empty string. 如果M()返回null,则结果为空字符串。 (null + empty is empty.) If M does not return null then the result is unchanged by the concatenation of the empty string. (null + empty为空。)如果M不返回null,则空字符串的连接不会改变结果。 Therefore, this is actually optimized as not a call to String.Concat at all! 因此,这实际上是优化的,因为根本不是对String.Concat的调用! It becomes 它成为了

s = M() ?? ""

Neat, eh? 干净,嗯?

In C#, the addition operator for strings is just syntactic sugar for String.Concat. 在C#中,字符串的加法运算符只是String.Concat的语法糖。 You can verify that by opening the output assembly in reflector. 您可以通过在反射器中打开输出组件来验证。

Another thing to note is, if you have string literals (or constants) in your code, such as in the example, the compiler even changes this to B = "abcdef" . 另外需要注意的是,如果代码中有字符串文字(或常量),例如在示例中,编译器甚至会将其更改为B = "abcdef"

But, if you use String.Concat with two string literals or constants, String.Concat will still be called, skipping the optimization, and so the + operation would actually be faster. 但是,如果你使用String.Concat两个字符串文字或常量,String.Concat仍然会被调用,跳过优化,所以+操作实际上是更快。

So, to sum it up: 所以,总结一下:

stringA + stringB becomes String.Concat(stringA, stringB) . stringA + stringB变为String.Concat(stringA, stringB)
"abc" + "def" becomes "abcdef " "abc" + "def"变成"abcdef
String.Concat("abc", "def") stays the same String.Concat("abc", "def")保持不变

Something else i just had to try: 我必须尝试的其他东西:

In C++/CLI, "abc" + "def" + "ghi " is actually translated to String.Concat(String.Concat("abc", "def"), "ghi") 在C ++ / CLI中, "abc" + "def" + "ghi ”实际上被翻译为String.Concat(String.Concat("abc", "def"), "ghi")

Actually, B is resolved during compile time. 实际上,B在编译期间被解析。 You will end up with B = "abcdef" whereas for A, the concatenation is postponed until execution time. 最终将得到B = "abcdef"而对于A,连接将推迟到执行时间。

If the strings are literals, as in your question, then the concatenation of the strings assigned to B will be done at compile-time. 如果字符串是文字,就像你的问题一样,那么分配给B的字符串的串联将在编译时完成。 Your example translates to: 您的示例转换为:

string a = string.Concat("abc", "def");
string b = "abcdef";

If the strings aren't literals then the compiler will translate the + operator into a Concat call. 如果字符串不是文字,那么编译器会将+运算符转换为Concat调用。

So this... 所以这...

string x = GetStringFromSomewhere();
string y = GetAnotherString();

string a = string.Concat(x, y);
string b = x + y;

...is translated to this at compile-time: ...在编译时被翻译成这个:

string x = GetStringFromSomewhere();
string y = GetAnotherString();

string a = string.Concat(x, y);
string b = string.Concat(x, y);

In this particular case, the two are actually identical. 在这种特殊情况下,两者实际上是相同的。 The compiler will transform the second variant, the one using the + operator, into a call to Concat, the first variant. 编译器将第二个变量(使用+运算符)转换为对第一个变体Concat的调用。

Well, that is, if the two actually contained string variables that was concatenated. 好吧,也就是说,如果两个实际包含连接的字符串变量。

This code: 这段代码:

B = "abc" + "def";

actually transforms into this, without concatenation at all: 实际上转换成这个,没有连接:

B = "abcdef";

This can be done because the result of the addition can be computed at compile-time, so the compiler does this. 这可以完成,因为可以在编译时计算加法的结果,因此编译器会这样做。

However, if you were to use something like this: 但是,如果你使用这样的东西:

A = String.Concat(stringVariable1, stringVariable2);
B = stringVariable1 + stringVariable2;

Then those two will generate the same code. 然后这两个将生成相同的代码。

However, I would like to know exactly what those "many" said, as i think it is something different. 但是,我想知道那些“很多”所说的确切内容,因为我觉得它有所不同。

What I think they said is that string concatenation is bad, and you should use StringBuilder or similar. 我认为他们说的是字符串连接是坏的,你应该使用StringBuilder或类似的。

For instance, if you do this: 例如,如果你这样做:

String s = "test";
for (int index = 1; index <= 10000; index++)
    s = s + "test";

Then what happens is that for each iteration through the loop, you'll build one new string, and let the old one be eligible for garbage collection. 然后会发生的是,对于循环中的每次迭代,您将构建一个新字符串,并让旧字符串有资格进行垃圾回收。

Additionally, each such new string will have all the contents of the old one copied into it, which means you'll be moving a large amount of memory around. 此外,每个这样的新字符串都会将旧字符串的所有内容复制到其中,这意味着您将移动大量内存。

Whereas the following code: 以下代码:

StringBuilder sb = new StringBuilder("test");
for (int index = 1; index <= 10000; index++)
    sb.Append("test");

Will instead use an internal buffer, that is larger than what needs be, just in case you need to append more text into it. 而是使用一个大于需要的内部缓冲区,以防万一你需要在其中添加更多文本。 When that buffer becomes full, a new one that is larger will be allocated, and the old one left for garbage collection. 当该缓冲区变满时,将分配一个较大的新缓冲区,并将旧的缓冲区留作垃圾收集。

So in terms of memory use and CPU usage, the later variant is much better. 因此,在内存使用和CPU使用方面,后一种变体要好得多。

Other than that, I would try to avoid focusing too much on "is code variant X better than Y", beyond what you already have experience with. 除此之外,我会尽量避免过分关注“代码变体X比Y更好”,超出了你已经体验过的。 For instance, I use StringBuilder now just because I'm aware of the case, but that isn't to say that all the code I write that use it actually needs it. 例如,我现在使用StringBuilder只是因为我知道这种情况,但这并不是说我编写的所有使用它的代码实际上都需要它。

Try to avoid spending time micro-optimizing your code, until you know you have a bottleneck. 尽量避免花时间微优化您的代码,直到您知道自己有瓶颈为止。 At that time, the usual tip about measure first, cut later, is still in effect. 那个时候,关于措施的通常提示,后来削减,仍然有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM