简体   繁体   中英

Why does the Oracle Java compiler prefer the no-args StringBuilder constructor?

Purely out of interest I've been looking at how the Oracle Java compiler handles String concatenation and I'm seeing something I didn't expect.

Given the following code:

public class StringTest {
    public static void main(String... args) {
        String s = "Test" + getSpace() + "String.";
        System.out.println(s.toString());
    }

    // Stops the compiler optimising the concatenations down to a
    // single string literal.
    static String getSpace() {
        return " ";
    }
}

I expected that the compiler would optimise it to the equivalent of:

String s = new StringBuilder("Test").append(getSpace())
                   .append("String.").toString();

But it actually compiles down to the equivalent of:

String s = new StringBuilder().append("Test").append(getSpace())
                   .append("String.").toString();

I'm compiling this using the 32-bit jdk1.7.0_55 release. This is the output of javap -v -l :

public class StringTest
  SourceFile: "StringTest.java"
  minor version: 0
  major version: 51
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #14.#25        //  java/lang/Object."<init>":()V
   #2 = Class              #26            //  java/lang/StringBuilder
   #3 = Methodref          #2.#25         //  java/lang/StringBuilder."<init>":()V
   #4 = String             #27            //  Test
   #5 = Methodref          #2.#28         //  java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   #6 = Methodref          #13.#29        //  StringTest.getSpace:()Ljava/lang/String;
   #7 = String             #30            //  String.
   #8 = Methodref          #2.#31         //  java/lang/StringBuilder.toString:()Ljava/lang/String;
   #9 = Fieldref           #32.#33        //  java/lang/System.out:Ljava/io/PrintStream;
  #10 = Methodref          #34.#31        //  java/lang/String.toString:()Ljava/lang/String;
  #11 = Methodref          #35.#36        //  java/io/PrintStream.println:(Ljava/lang/String;)V
  #12 = String             #37            //
  #13 = Class              #38            //  StringTest
  #14 = Class              #39            //  java/lang/Object
  #15 = Utf8               <init>
  #16 = Utf8               ()V
  #17 = Utf8               Code
  #18 = Utf8               LineNumberTable
  #19 = Utf8               main
  #20 = Utf8               ([Ljava/lang/String;)V
  #21 = Utf8               getSpace
  #22 = Utf8               ()Ljava/lang/String;
  #23 = Utf8               SourceFile
  #24 = Utf8               StringTest.java
  #25 = NameAndType        #15:#16        //  "<init>":()V
  #26 = Utf8               java/lang/StringBuilder
  #27 = Utf8               Test
  #28 = NameAndType        #40:#41        //  append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  #29 = NameAndType        #21:#22        //  getSpace:()Ljava/lang/String;
  #30 = Utf8               String.
  #31 = NameAndType        #42:#22        //  toString:()Ljava/lang/String;
  #32 = Class              #43            //  java/lang/System
  #33 = NameAndType        #44:#45        //  out:Ljava/io/PrintStream;
  #34 = Class              #46            //  java/lang/String
  #35 = Class              #47            //  java/io/PrintStream
  #36 = NameAndType        #48:#49        //  println:(Ljava/lang/String;)V
  #37 = Utf8
  #38 = Utf8               StringTest
  #39 = Utf8               java/lang/Object
  #40 = Utf8               append
  #41 = Utf8               (Ljava/lang/String;)Ljava/lang/StringBuilder;
  #42 = Utf8               toString
  #43 = Utf8               java/lang/System
  #44 = Utf8               out
  #45 = Utf8               Ljava/io/PrintStream;
  #46 = Utf8               java/lang/String
  #47 = Utf8               java/io/PrintStream
  #48 = Utf8               println
  #49 = Utf8               (Ljava/lang/String;)V
{
  public StringTest();
    flags: ACC_PUBLIC
    LineNumberTable:
      line 2: 0
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 2: 0

  public static void main(java.lang.String...);
    flags: ACC_PUBLIC, ACC_STATIC, ACC_VARARGS
    LineNumberTable:
      line 4: 0
      line 5: 27
      line 6: 37
    Code:
      stack=2, locals=2, args_size=1
         0: new           #2                  // class java/lang/StringBuilder
         3: dup
         4: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
         7: ldc           #4                  // String Test
         9: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        12: invokestatic  #6                  // Method getSpace:()Ljava/lang/String;
        15: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        18: ldc           #7                  // String String.
        20: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        23: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
        26: astore_1
        27: getstatic     #9                  // Field java/lang/System.out:Ljava/io/PrintStream;
        30: aload_1
        31: invokevirtual #10                 // Method java/lang/String.toString:()Ljava/lang/String;
        34: invokevirtual #11                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
        37: return
      LineNumberTable:
        line 4: 0
        line 5: 27
        line 6: 37

  static java.lang.String getSpace();
    flags: ACC_STATIC
    LineNumberTable:
      line 10: 0
    Code:
      stack=1, locals=0, args_size=0
         0: ldc           #12                 // String
         2: areturn
      LineNumberTable:
        line 10: 0
}

Anecdotally, I've read here that the ECJ compiler does actually compile down to the argumented constructor (although I haven't verified it for myself), so my question is why doesn't Oracle's compiler make that same optimisation ?


Based on the comments I ran another test using a longer String so as to immediately exceed the default length of the StringBuilder 's backing char[] :

public class StringTest {
    public static void main(String... args) {
        String s = "Testing a much, much longer " + getSpace() + "String.";
        System.out.println(s.toString());
    }

    // Stops the compiler optimising the concatenations down to a single string literal
    static String getSpace() {
        return " ";
    }
}

With the exception of the contents of the literals being slightly different, the generated bytecode is exactly the same, still using the no-args constructor to instantiate the StringBuilder before appending to it. In this situation the argumented constructor version of the code should out-perform the no-args one as far as I can tell. This is due to the need to re-size the backing char[] at the first call to append() , and then potentially needing to do it again on the next append() if the appended String was particularly large.


On AnubianNoob's suggestion I did a quick performance test of System.arraycopy(...) to see if it was indeed optimised for empty arrays. This is the code used:

public class ArrayCopyTest {
    public static void main(String... args) {

        char[] array = new char[16];
        final long test1Start = System.nanoTime();
        for (int i = 0; i < 1000000; i++) {
            System.arraycopy(array, 0, array, 0, array.length);
        }

        final long test1End = System.nanoTime();
        System.out.println("Elapsed Time (empty array copies)");
        System.out.println("=================================");
        System.out.println((test1End - test1Start) + "ns");

        char[] array2 = new char[] {'0', '1', '2', '3', '4', '5', '6', '7', '8',
             '9', 'a', 'b', 'c', 'd', 'e', 'f'};

        final long test2Start = System.nanoTime();
        for (int i = 0; i < 1000000; i++) {
            System.arraycopy(array2, 0, array2, 0, array2.length);
        }

        final long test2End = System.nanoTime();
        System.out.println("Elapsed Time (non-empty array copies)");
        System.out.println("=====================================");
        System.out.println((test2End - test2Start) + "ns");
    }
}

Running this on a Windows 7.1 32-bit machine with an i7-2600 CPU @ 3.40 GHz 3.39 GHz and 3.24 GB of usable RAM produced:

Elapsed Time (empty array copies)
=================================
26660199ns
Elapsed Time (non-empty array copies)
=====================================
19431962ns

I ran this about five times just to be sure. It actually appears that it performs better over a million iterations when the array isn't empty. As Mike Strobel correctly pointed out, the above isn't a meaningful benchmark.

Probably because the String constructor calls append() anyways:

public StringBuilder(String str) {
    super(str.length() + 16);
    append(str);
}

I think this is just laziness. Why? Since if you pick the arg-constructor, you need further checks. You have to check whether the first expression to be concatenated is a string, if so, you can use the arg constructor, otherwise, you have to fall back to the no-arg constructor. This is just a lot more logic than simply always taking the no-arg constructor.

If I was that compiler developer, I would have chosen the easy way, too, since implicit string concatenation is surely not the bottleneck in many applications and the difference is so small that it is just not worth the hassle.

Most people think of compilers as magic programs designed by super humans that always do the best things. But this is not true, compilers are also written by usual programmers which do not always think hours about what is the best way to compile any specific thing. They have tight schedules and need features to get done, so the easiest solution is often the one of choice.

这可能是因为JVM优化了字符串连接,它可能更好地识别字节码中的字符串连接模式,就像它现在实现的方式一样。

As another person has mentioned, the StringBuilder class calls append() in its constructor, and it's a lot more readable and consistent to have an append yourself.

Consider:

new StringBuilder("Hello").append("World");
new StringBuilder().append("Hello").append("World");

This might not be the best example, but two appends is a lot simpler to see than passing it into the constructor. And the speed is the same.

By the way, there are related issues in JDK issue tracker: JDK-4059189 and related. The initial proposal is dated 1997! And there are not much discussion there. This means that this issue is either considered unimportant or this case is optimized by JIT.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM