简体   繁体   中英

Should multiplication/bitshift optimization be visible in java bytecode

I keep reading that bitshifting is not needed, as compiler optimisations will translate a multiplication to a bitshift. Such as Should I bit-shift to divide by 2 in Java? and Is shifting bits faster than multiplying and dividing in Java? .NET?

I am not inquiring here about the performance difference, I could test that out myself. But what I think is curious, is that several people mention that it will be "compiled to the same thing". Which seems to be not true. I have written a small piece of code.

private static void multi()
{
    int a = 3;
    int b = a * 2;
    System.out.println(b);
}

private static void shift()
{
    int a = 3;
    int b = a << 1L;
    System.out.println(b);
}

Which gives the same result, and just prints it out.

When I look at the generated Java Bytecode, the following is shown.

private static void multi();
Code:
   0: iconst_3
   1: istore_0
   2: iload_0
   3: iconst_2
   4: imul
   5: istore_1
   6: getstatic     #4                  // Field java/lang/System.out:Ljava/io/PrintStream;
   9: iload_1
  10: invokevirtual #5                  // Method java/io/PrintStream.println:(I)V
  13: return

private static void shift();
Code:
   0: iconst_3
   1: istore_0
   2: iload_0
   3: iconst_1
   4: ishl
   5: istore_1
   6: getstatic     #4                  // Field java/lang/System.out:Ljava/io/PrintStream;
   9: iload_1
  10: invokevirtual #5                  // Method java/io/PrintStream.println:(I)V
  13: return

Now we can see the difference between "imul" and "ishl".

My question being: clearly the spoken off optimization is not visible in the java bytecode. I am still assuming the optimization does happen, so does it just happen at a lower level? Or, alternatively because it is Java, does the JVM when encountering the imul statement somehow know that it should be translated to something else. If so, any resources on how this is handled would be greatly appreciated.

(as a sidenote, I am not trying to justify the need for bitshifting. I think it decreases readability, at least to people used to Java, for C++ that might be different. I am just trying to see where the optimization happens).

The question in the title sounds a bit different than the question in the text. The quoted statement that the shift and the multiplication would be "compiled to the same thing" is true. But it does not yet apply to the bytecode.

In general, the Java bytecode is rather un-optimized. There are very few optimizations done at all - mainly inlining of constants. Beyond that, the Java bytecode is just an intermediate representation of the original program. And the translation from Java to Java bytecode is done rather "literally".

(Which, I think, is a good thing. The bytecode still closely resembles the original Java code. All the nitty-gritty (platform specific!) optimizations that may be possible are left to the virtual machine, which has far more options here.

All further optimizations, like arithmetic optimizations, dead code elimination or method inlining, are done by the JIT (Just-In-Time-Compiler), at runtime. The Just-In-Time compiler also applies the optimization of replacing the multiplication by a bit shift.

The example that you gave makes it a bit difficult to show the effect, for several reasons. The fact that the System.out.println was included in the method tends to make the actual machine code large, due to inlining and the general prerequisites for calling this method. But more importantly, the shift by 1, which corresponds to a multiplication with 2, also corresponds to an addition of the value to itself. So instead of observing a shl (left-shift) assembler instruction in the resulting machine code for the multi method, you'd likely see a disguised add instruction in the multi - and the shift method.

However, here is a very pragmatic example that does a left-shift by 8, corresponding to a multiplication with 256:

class BitShiftOptimization
{
    public static void main(String args[])
    {
        int blackHole = 0;
        for (int i=0; i<1000000; i++)
        {
            blackHole += testMulti(i);
            blackHole += testShift(i);
        }
        System.out.println(blackHole);

    }

    public static int testMulti(int a)
    {
        int b = a * 256;
        return b;
    }

    public static int testShift(int a)
    {
        int b = a << 8L;
        return b;
    }
}

(It receives the value to be shifted as an argument, to prevent it from being optimized away into a constant. It calls the methods several times, to trigger the JIT. And it returns and collects the values from both methods to prevent the method calls to be optimized away. Again, this is very pragmatic, but sufficient to show the effect)

Running this in a Hotspot Disassembler VM with

java -server -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintInlining -XX:+PrintAssembly BitShiftOptimization

will produce the following assembler code for the testMulti method:

Decoding compiled method 0x000000000286fbd0:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000000001c0003b0} &apos;testMulti&apos; &apos;(I)I&apos; in &apos;BitShiftOptimization&apos;
  # parm0:    rdx       = int
  #           [sp+0x40]  (sp of caller)
  0x000000000286fd20: mov    %eax,-0x6000(%rsp)
  0x000000000286fd27: push   %rbp
  0x000000000286fd28: sub    $0x30,%rsp
  0x000000000286fd2c: movabs $0x1c0005a8,%rax   ;   {metadata(method data for {method} {0x000000001c0003b0} &apos;testMulti&apos; &apos;(I)I&apos; in &apos;BitShiftOptimization&apos;)}
  0x000000000286fd36: mov    0xdc(%rax),%esi
  0x000000000286fd3c: add    $0x8,%esi
  0x000000000286fd3f: mov    %esi,0xdc(%rax)
  0x000000000286fd45: movabs $0x1c0003a8,%rax   ;   {metadata({method} {0x000000001c0003b0} &apos;testMulti&apos; &apos;(I)I&apos; in &apos;BitShiftOptimization&apos;)}
  0x000000000286fd4f: and    $0x1ff8,%esi
  0x000000000286fd55: cmp    $0x0,%esi
  0x000000000286fd58: je     0x000000000286fd70  ;*iload_0
                        ; - BitShiftOptimization::testMulti@0 (line 17)

  0x000000000286fd5e: shl    $0x8,%edx
  0x000000000286fd61: mov    %rdx,%rax
  0x000000000286fd64: add    $0x30,%rsp
  0x000000000286fd68: pop    %rbp
  0x000000000286fd69: test   %eax,-0x273fc6f(%rip)        # 0x0000000000130100
                        ;   {poll_return}
  0x000000000286fd6f: retq   
  0x000000000286fd70: mov    %rax,0x8(%rsp)
  0x000000000286fd75: movq   $0xffffffffffffffff,(%rsp)
  0x000000000286fd7d: callq  0x000000000285f160  ; OopMap{off=98}
                        ;*synchronization entry
                        ; - BitShiftOptimization::testMulti@-1 (line 17)
                        ;   {runtime_call}
  0x000000000286fd82: jmp    0x000000000286fd5e
  0x000000000286fd84: nop
  0x000000000286fd85: nop
  0x000000000286fd86: mov    0x2a8(%r15),%rax
  0x000000000286fd8d: movabs $0x0,%r10
  0x000000000286fd97: mov    %r10,0x2a8(%r15)
  0x000000000286fd9e: movabs $0x0,%r10
  0x000000000286fda8: mov    %r10,0x2b0(%r15)
  0x000000000286fdaf: add    $0x30,%rsp
  0x000000000286fdb3: pop    %rbp
  0x000000000286fdb4: jmpq   0x0000000002859420  ;   {runtime_call}
  0x000000000286fdb9: hlt    
  0x000000000286fdba: hlt    
  0x000000000286fdbb: hlt    
  0x000000000286fdbc: hlt    
  0x000000000286fdbd: hlt    
  0x000000000286fdbe: hlt    

(the code for the testShift method has the same instructions, by the way).

The relevant line here is

  0x000000000286fd5e: shl    $0x8,%edx

which corresponds to the left-shift by 8.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM