奇怪的方法调用优化问题

Question

I have been testing problem with too slow DataInputStream.readByte() method working, and found interesting, but incomprehensible issue. 我一直在测试太慢的DataInputStream.readByte()方法工作的问题，并发现有趣但不可理解的问题。 I'm using jdk1.7.0_40 , Windows 7 64 bit . 我使用的是jdk1.7.0_40 ， Windows 7 64 bit 。

Consider we have some huge byte-array and reading data from it. 考虑我们有一些巨大的字节数组并从中读取数据。 And let's compare 4 methods for reading byte-by-byte from this array: 让我们比较4个从这个数组逐字节读取的方法：

reading via simple loop 通过简单的循环阅读
reading via ByteArrayInputStream -> DataInputStream 通过ByteArrayInputStream读取 - > DataInputStream
reading via ByteArrayInputStream -> our own DataInputStream implementation ( MyDataInputStream ) 通过ByteArrayInputStream读取 - >我们自己的DataInputStream实现（ MyDataInputStream ）
reading via ByteArrayInputStream and copy of method readByte() from DataInputStream . 通过ByteArrayInputStream读取并从DataInputStream复制方法readByte() 。

I have found following results (after long-time of test-loop iterating): 我发现了以下结果（经过长时间的测试循环迭代）：

Loop took aprox. Loop采取了aprox。 312446094 ns 312446094 ns
DataInputStream took apox. DataInputStream采用了天花。 2555898090 ns 2555898090 ns
MyDataInputStream took aprox. MyDataInputStream采用了aprox。 2630664298 ns 2630664298 ns
Via method readByte() copy took 309265568 ns 通过方法readByte()复制花了309265568 ns

In other words, we have strange optimization issue : the same operations via object method invocation take in 10 times longer work, then via "native" implementation. 换句话说， 我们有一个奇怪的优化问题 ：通过对象方法调用的相同操作需要10倍的工作，然后通过“本机”实现。

The question: why? 问题： 为什么？ . 。

For information: 有关信息：

@Test
public void testBytes1() throws IOException {
    byte[] bytes = new byte[1_000_000_000];
    Random r = new Random();
    for (int i = 0; i < bytes.length; i++)
        bytes[i] = (byte) r.nextInt();

    do {
        System.out.println();

        bytes[r.nextInt(1_000_000_000)] = (byte) r.nextInt();

        testLoop(bytes);
        testDis(bytes);
        testMyDis(bytes);
        testViaMethod(bytes);
    } while (true);
}

private void testDis(byte[] bytes) throws IOException {
    long time1 = System.nanoTime();
    long c = 0;
    try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
         DataInputStream dis = new DataInputStream(bais)) {
        for (int i = 0; i < bytes.length; i++) {
            c += dis.readByte();
        }
    }
    long time2 = System.nanoTime();
    System.out.println("Dis: \t\t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
}

private void testMyDis(byte[] bytes) throws IOException {
    long time1 = System.nanoTime();
    long c = 0;
    try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
         MyDataInputStream dis = new MyDataInputStream(bais)) {
        for (int i = 0; i < bytes.length; i++) {
            c += dis.readByte();
        }
    }
    long time2 = System.nanoTime();
    System.out.println("My Dis: \t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
}

private void testViaMethod(byte[] bytes) throws IOException {
    long time1 = System.nanoTime();
    long c = 0;
    try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes)
    ) {
        for (int i = 0; i < bytes.length; i++) {
            c += readByte(bais);
        }
    }
    long time2 = System.nanoTime();
    System.out.println("Via method: \t\t" + (time2 - time1) + "\t\t\t\t" + c);
}

private void testLoop(byte[] bytes) {
    long time1 = System.nanoTime();
    long c = 0;
    for (int i = 0; i < bytes.length; i++) {
        c += bytes[i];
    }
    long time2 = System.nanoTime();
    System.out.println("Loop: \t\t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
}

public final byte readByte(InputStream in) throws IOException {
    int ch = in.read();
    if (ch < 0)
        throw new EOFException();
    return (byte)(ch);
}

static class MyDataInputStream implements Closeable {

    InputStream in;

    MyDataInputStream(InputStream in) {
        this.in = in;
    }

    public final byte readByte() throws IOException {
        int ch = in.read();
        if (ch < 0)
            throw new EOFException();
        return (byte)(ch);
    }

    @Override
    public void close() throws IOException {
        in.close();
    }
}

PS Update for thoose, who is in doubt about my results, this is printout, using -XX:+PrintCompilation -verbose:gc -XX:CICompilerCount=1 PS更新为thoose，谁对我的结果有疑问，这是打印输出，使用-XX:+PrintCompilation -verbose:gc -XX:CICompilerCount=1

     37    1             java.lang.String::hashCode (55 bytes)
     41    2             java.lang.String::charAt (29 bytes)
     43    3             java.lang.String::indexOf (70 bytes)
     49    4             java.lang.AbstractStringBuilder::ensureCapacityInternal (16 bytes)
     52    5             java.lang.AbstractStringBuilder::append (29 bytes)
    237    6             java.util.Random::nextInt (7 bytes)
    237    9     n       sun.misc.Unsafe::compareAndSwapLong (native)   
    238    7             java.util.concurrent.atomic.AtomicLong::get (5 bytes)
    238    8             java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
    239   10             java.util.Random::next (47 bytes)
    239   11 %           fias.TestArrays::testBytes1 @ 15 (77 bytes)
   9645   11 %           fias.TestArrays::testBytes1 @ -2 (77 bytes)   made not entrant

   9646   12 %           fias.TestArrays::testLoop @ 10 (77 bytes)
   9964   12 %           fias.TestArrays::testLoop @ -2 (77 bytes)   made not entrant
Loop:               318726397               -500090432
   9965   13             java.io.DataInputStream::readByte (23 bytes)
   9966   14  s          java.io.ByteArrayInputStream::read (36 bytes)
   9967   15 % !         fias.TestArrays::testDis @ 37 (279 bytes)
Dis:                2684374258              -500090432
  12651   16             fias.TestArrays$MyDataInputStream::readByte (23 bytes)
  12652   17 % !         fias.TestArrays::testMyDis @ 37 (279 bytes)
My Dis:             2675570541              -500090432
  15327   18             fias.TestArrays::readByte (20 bytes)
  15328   19 % !         fias.TestArrays::testViaMethod @ 23 (179 bytes)
Via method:         2367507141              -500090432

  17694   20             fias.TestArrays::testLoop (77 bytes)
  17699   21 %           fias.TestArrays::testLoop @ 10 (77 bytes)
Loop:               374525891               -500090567
  18069   22   !         fias.TestArrays::testDis (279 bytes)
Dis:                2674626125              -500090567
  20745   23   !         fias.TestArrays::testMyDis (279 bytes)
My Dis:             2671418683              -500090567
  23417   24   !         fias.TestArrays::testViaMethod (179 bytes)
Via method:         2359181776              -500090567

Loop:               315081855               -500090663
Dis:                2558738649              -500090663
My Dis:             2627056034              -500090663
Via method:         311692727               -500090663

Loop:               317813286               -500090778
Dis:                2565161726              -500090778
My Dis:             2630665760              -500090778
Via method:         314594434               -500090778

Loop:               313695660               -500090797
Dis:                2568251556              -500090797
My Dis:             2635236578              -500090797
Via method:         311882312               -500090797

Loop:               316781686               -500090929
Dis:                2563535623              -500090929
My Dis:             2638487613              -500090929
Via method:         313170789               -500090929

UPD-2 : Here is benchmark and results kindly given by @maaartinus. UPD-2 ：这是@maaartinus友情提供的基准和结果。

Answer 1

Surprisingly, reason is try-with-resources statement on MyDataInputStream / DataInputStream 令人惊讶的是，原因是MyDataInputStream / DataInputStream上的try-with-resources语句

if we move initialization inside try block performance will be like loop/method invocation 如果我们在try块中移动初始化，性能就像循环/方法调用一样

private void testMyDis(byte[] bytes) throws IOException {
    final long time1 = System.nanoTime();
    long c = 0;
    try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes)) {
        final MyDataInputStream dis = new MyDataInputStream(bais);
        for (int i = 0; i < bytes.length; i++) {
            c += dis.readByte();
        }
    }
    final long time2 = System.nanoTime();
    System.out.println("My Dis: \t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
}

I think that with that unnecessary resource JIT cannot use Range Check Elimination 我认为有了这个不必要的资源，JIT就无法使用Range Check Elimination

Answer 2

The answer has been in the test. 答案已经在测试中。 the extra cost is owe to function invocation. 额外的成本归功于函数调用。 commonly we encourage to write short and clean functions instead of long functions and consider function invocation has very low cost. 通常我们鼓励编写简短而干净的函数而不是长函数，并且考虑函数调用的成本非常低。 but the invocation cost is still larger than direct memory access. 但调用成本仍然大于直接内存访问。

in this case, for testloop, we can estimate a memory read costs ~3 ns ( includes integer operations, eg i++, c + ) for others, there's 2 additonal layers of function invocations.so each function calls is ~15 ns. 在这种情况下，对于testloop，我们可以估计内存读取成本〜3 ns（包括整数运算，例如i ++，c +），其他有2个附加的函数调用层。因此每个函数调用约为15 ns。 actuality we can say function call is very fast. 现实我们可以说函数调用非常快。

the only point is that there's 2 000 000 000 functions calls in each process, that's really a large number. 唯一的一点是每个进程中有2 000 000个函数调用，这真的是一个很大的数字。

there's another test case to prove the function call costs: do not use any stream, just read bytes with additional function calls: 还有另一个测试用例来证明函数调用成本：不使用任何流，只需使用附加函数调用读取字节：

add below function, 添加以下功能，

public final long getByte( long c, byte value, int dep ) {
    if ( dep > 0 ) {
        return getByte( c, value, dep - 1);
    }
    return c + value;
}

then invoke in testLoop like: 然后在testLoop中调用，如：

c = getByte( c, bytes[i], 2);

then the final cost increase to the same level: 然后将最终成本增加到同一水平：

Loop: 4044010718 -499870245 环路：4044010718 -499870245

Dis: 5182272442 -499870245 Dis：5182272442 -499870245

My Dis: 5228065271 -499870245 我的消息：5228065271 -499870245

Via method: 655108198 -499870245 通过方法：655108198 -499870245

奇怪的方法调用优化问题

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-06-19 08:16:24

解决方案2
-1 2014-06-18 09:39:20

奇怪的方法调用优化问题

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-06-19 08:16:24

解决方案2 -1 2014-06-18 09:39:20

解决方案1
3 已采纳 2014-06-19 08:16:24

解决方案2
-1 2014-06-18 09:39:20