简体   繁体   English

StringBuilder 在 C# 内部如何工作?

[英]How does StringBuilder work internally in C#?

How does StringBuilder work? StringBuilder是如何工作的?

What does it do internally ?它在内部做什么? Does it use unsafe code?它使用不安全的代码吗? And why is it so fast (compared to the + operator)?为什么它这么快(与+运算符相比)?

When you use the + operator to build up a string:当您使用 + 运算符构建字符串时:

string s = "01";
s += "02";
s += "03";
s += "04";

then on the first concatenation we make a new string of length four and copy "01" and "02" into it -- four characters are copied.然后在第一次连接时,我们创建一个长度为 4 的新字符串,并将“01”和“02”复制到其中——四个字符被复制。 On the second concatenation we make a new string of length six and copy "0102" and "03" into it -- six characters are copied.在第二个连接中,我们创建了一个长度为 6 的新字符串,并将“0102”和“03”复制到其中——复制了六个字符。 On the third concat, we make a string of length eight and copy "010203" and "04" into it -- eight characters are copied.在第三个 concat 中,我们创建一个长度为 8 的字符串并将“010203”和“04”复制到其中——复制了八个字符。 So far a total of 4 + 6 + 8 = 18 characters have been copied for this eight-character string.到目前为止,这个八字符的字符串总共复制了 4 + 6 + 8 = 18 个字符。 Keep going.继续。

...
s += "99";

On the 98th concat we make a string of length 198 and copy "010203...98" and "99" into it.在第 98 个 concat 中,我们创建一个长度为 198 的字符串,并将“010203...98”和“99”复制到其中。 That gives us a total of 4 + 6 + 8 +... + 198 = a lot, in order to make this 198 character string.这给了我们总共 4 + 6 + 8 +... + 198 = 很多,以便制作这个 198 个字符的字符串。

A string builder doesn't do all that copying.字符串生成器不会进行所有复制。 Rather, it maintains a mutable array that is hoped to be larger than the final string, and stuffs new things into the array as necessary.相反,它维护一个希望大于最终字符串的可变数组,并在必要时将新内容填充到数组中。

What happens when the guess is wrong and the array gets full?当猜测错误并且数组已满时会发生什么? There are two strategies.有两种策略。 In the previous version of the framework, the string builder reallocated and copied the array when it got full, and doubled its size.在之前版本的框架中,字符串构建器在数组满时重新分配和复制,并将其大小翻倍。 In the new implementation, the string builder maintains a linked list of relatively small arrays, and appends a new array onto the end of the list when the old one gets full.在新的实现中,字符串生成器维护一个相对较小的 arrays 的链表,并在旧数组满时将新数组附加到链表的末尾。

Also, as you have conjectured, the string builder can do tricks with "unsafe" code to improve its performance.此外,正如您所猜想的那样,字符串生成器可以使用“不安全”代码进行技巧以提高其性能。 For example, the code which writes the new data into the array can already have checked that the array write is going to be within bounds.例如,将新数据写入数组的代码已经检查过数组写入是否在界限内。 By turning off the safety system it can avoid the per-write check that the jitter might otherwise insert to verify that every write to the array is safe.通过关闭安全系统,它可以避免每次写入检查,否则抖动可能会插入以验证对阵列的每次写入都是安全的。 The string builder does a number of these sorts of tricks to do things like ensuring that buffers are reused rather than reallocated, ensuring that unnecessary safety checks are avoided, and so on.字符串构建器执行了许多此类技巧来执行诸如确保缓冲区被重用而不是重新分配、确保避免不必要的安全检查等事情。 I recommend against these sorts of shenanigans unless you are really good at writing unsafe code correctly, and really do need to eke out every last bit of performance.我建议不要使用这些恶作剧,除非你真的很擅长正确编写不安全的代码,并且确实需要勉强发挥最后一点性能。

StringBuilder 's implementation has changed between versions, I believe.我相信StringBuilder的实现在版本之间发生了变化。 Fundamentally though, it maintains a mutable structure of some form.但从根本上说,它保持了某种形式的可变结构。 I believe it used to use a string which was still being mutated (using internal methods) and would just make sure it would never be mutated after it was returned.我相信它曾经使用一个仍在变异的字符串(使用内部方法),并且只是确保它在返回后永远不会变异。

The reason StringBuilder is faster than using string concatenation in a loop is precisely because of the mutability - it doesn't require a new string to be constructed after each mutation, which would mean copying all the data within the string etc. StringBuilder在循环中使用字符串连接更快的原因正是因为可变性 - 它不需要在每次突变后构造新字符串,这意味着复制字符串中的所有数据等。

For just a single concatenation, it's actually slightly more efficient to use + than to use StringBuilder .对于单个连接,使用+实际上比使用StringBuilder更有效。 It's only when you're performing multiple operations and you don't really need the intermediate results that StringBuilder shines.只有当您执行多个操作并且您并不真正需要StringBuilder闪耀的中间结果时。

See my article on StringBuilder for more information.有关详细信息,请参阅我关于StringBuilder的文章

The Microsoft CLR does do some operations with internal call (not quite the same as unsafe code). Microsoft CLR 确实通过内部调用执行了一些操作(与不安全代码不太一样)。 The biggest performance benefit over a bunch of + concatenated strings is that it writes to a char[] and doesn't create as many intermediate strings.与一堆+连接的字符串相比,最大的性能优势是它写入char[]并且不会创建尽可能多的中间字符串。 When you call ToString (), it builds a completed, immutable string from your contents.当您调用 ToString () 时,它会根据您的内容构建一个完整的、不可变的字符串。

The StringBuilder uses a string buffer that can be altered, compared to a regular String that can't be.与不能更改的常规String相比, StringBuilder使用可以更改的字符串缓冲区。 When you call the ToString method of the StringBuilder it will just freeze the string buffer and convert it into a regular string, so it doesn't have to copy all the data one extra time.当您调用StringBuilderToString方法时,它只会冻结字符串缓冲区并将其转换为常规字符串,因此不必多复制一次所有数据。

As the StringBuilder can alter the string buffer, it doesn't have to create a new string value for each and every change to the string data.由于StringBuilder可以更改字符串缓冲区,因此不必为字符串数据的每次更改都创建新的字符串值。 When you use the + operator, the compiler turns that into a String.Concat call that creates a new string object.当您使用+运算符时,编译器会将其转换为创建新字符串 object 的String.Concat调用。 This seemingly innocent piece of code:这段看似无辜的代码:

str += ",";

compiles into this:编译成这样:

str = String.Concat(str, ",");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM