简体   繁体   中英

Efficient string concatenation algorithm

This is a follow up question to :
Concatenation by += slower than that by using StringBuilder()
where I found out that string1+=string2 is much slower than string1.append(string2) where string1 is now a StringBuilder

My follow up question:
What does StringBuilder internally do, that makes it much faster?

string1 += string2 

is basically equivalent to

StringBuilder b = new StringBuilder(string1);
b.append(string2);
string1 = b.toString();

or to

StringBuilder b = new StringBuilder();
b.append(string1);
b.append(string2);
string1 = b.toString();

depending on the compiler (thanks @JonK)

So, if you concatenate using += in a loop, you end up creating a lot of temporary String and StringBuilder instances, larger and larger at each iteration, and you also force the garbage collector to collect all those temporary objects.

Whereas if you use a StringBuilder, you create a single object, containing a single buffer that contains everything, until you end the loop and transform the buffer to a String. The buffer can become too small, but the algorithm then creates a new one that that is twice the size of the original, which means that even if you append thousands of strings, the number of buffers created is small. Much smaller than the number of String and StringBuilder instancs needed by += .

StringBuilder is not faster than "+=", it is only faster when used in a loop or if you need to concatenate multiple parts (because the intermediate result does not need to create a string object and throw it away).

In fact when you only need to add a single suffix outside a loop, += is not only easier to read, but also can be optimized better.

I did a JMH test to back up my statement, here are the results which show, that in a single-shot case stringbuilder and += (which internally does use SB as well) is nearly the same speed (with += having a slight advantage):

Benchmark                          (prefixLen)  Mode  Samples    Score  Score error  Units
n.e.j.StringConcat.plus                      0  avgt        5  121,327       17,461  ns/op
n.e.j.StringConcat.plus                     10  avgt        5  126,112       12,381  ns/op
n.e.j.StringConcat.plus                    100  avgt        5  181,492       17,012  ns/op
n.e.j.StringConcat.plusEquals                0  avgt        5  120,520        5,899  ns/op
n.e.j.StringConcat.plusEquals               10  avgt        5  131,687        7,651  ns/op
n.e.j.StringConcat.plusEquals              100  avgt        5  173,384        5,096  ns/op
n.e.j.StringConcat.stringBuffer              0  avgt        5  118,633        9,356  ns/op
n.e.j.StringConcat.stringBuffer             10  avgt        5  125,862       11,661  ns/op
n.e.j.StringConcat.stringBuffer            100  avgt        5  173,753        3,205  ns/op

(The numbers mean basically all 3 methods have the same performance, and please be aware this is not a very scientific approach for this kind of micro benchmark)

The code for that test is here: https://gist.github.com/ecki/5bcc3a4de669414401b6

And just to reiterate: yes StringBuffer is much more efficient of you need to to multiple concatenations to the same string, but when having only the need for a one-shot append I would prefer the + operator - not only for readability reasons.

I know that StringBuilder uses a char array to store the characters. It is more efficient than String because that char array is resized every time it`s necessary. In case of string, if you have String a="foo"; String b='foo' and you make a=a+b; the older reference stored in a is dropped and it is created a new String object(String is imutable).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM