[英]Strange method invocation optimization issue
I have been testing problem with too slow DataInputStream.readByte()
method working, and found interesting, but incomprehensible issue. 我一直在测试太慢的
DataInputStream.readByte()
方法工作的问题,并发现有趣但不可理解的问题。 I'm using jdk1.7.0_40
, Windows 7 64 bit
. 我使用的是
jdk1.7.0_40
, Windows 7 64 bit
。
Consider we have some huge byte-array and reading data from it. 考虑我们有一些巨大的字节数组并从中读取数据。 And let's compare 4 methods for reading byte-by-byte from this array:
让我们比较4个从这个数组逐字节读取的方法:
ByteArrayInputStream
-> DataInputStream
ByteArrayInputStream
读取 - > DataInputStream
ByteArrayInputStream
-> our own DataInputStream
implementation ( MyDataInputStream
) ByteArrayInputStream
读取 - >我们自己的DataInputStream
实现( MyDataInputStream
) ByteArrayInputStream
and copy of method readByte()
from DataInputStream
. ByteArrayInputStream
读取并从DataInputStream
复制方法readByte()
。 I have found following results (after long-time of test-loop iterating): 我发现了以下结果(经过长时间的测试循环迭代):
DataInputStream
took apox. DataInputStream
采用了天花。 2555898090 ns MyDataInputStream
took aprox. MyDataInputStream
采用了aprox。 2630664298 ns readByte()
copy took 309265568 ns readByte()
复制花了309265568 ns In other words, we have strange optimization issue : the same operations via object method invocation take in 10 times longer work, then via "native" implementation. 换句话说, 我们有一个奇怪的优化问题 :通过对象方法调用的相同操作需要10倍的工作,然后通过“本机”实现。
The question: why? 问题: 为什么? .
。
For information: 有关信息:
@Test
public void testBytes1() throws IOException {
byte[] bytes = new byte[1_000_000_000];
Random r = new Random();
for (int i = 0; i < bytes.length; i++)
bytes[i] = (byte) r.nextInt();
do {
System.out.println();
bytes[r.nextInt(1_000_000_000)] = (byte) r.nextInt();
testLoop(bytes);
testDis(bytes);
testMyDis(bytes);
testViaMethod(bytes);
} while (true);
}
private void testDis(byte[] bytes) throws IOException {
long time1 = System.nanoTime();
long c = 0;
try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
DataInputStream dis = new DataInputStream(bais)) {
for (int i = 0; i < bytes.length; i++) {
c += dis.readByte();
}
}
long time2 = System.nanoTime();
System.out.println("Dis: \t\t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
}
private void testMyDis(byte[] bytes) throws IOException {
long time1 = System.nanoTime();
long c = 0;
try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
MyDataInputStream dis = new MyDataInputStream(bais)) {
for (int i = 0; i < bytes.length; i++) {
c += dis.readByte();
}
}
long time2 = System.nanoTime();
System.out.println("My Dis: \t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
}
private void testViaMethod(byte[] bytes) throws IOException {
long time1 = System.nanoTime();
long c = 0;
try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes)
) {
for (int i = 0; i < bytes.length; i++) {
c += readByte(bais);
}
}
long time2 = System.nanoTime();
System.out.println("Via method: \t\t" + (time2 - time1) + "\t\t\t\t" + c);
}
private void testLoop(byte[] bytes) {
long time1 = System.nanoTime();
long c = 0;
for (int i = 0; i < bytes.length; i++) {
c += bytes[i];
}
long time2 = System.nanoTime();
System.out.println("Loop: \t\t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
}
public final byte readByte(InputStream in) throws IOException {
int ch = in.read();
if (ch < 0)
throw new EOFException();
return (byte)(ch);
}
static class MyDataInputStream implements Closeable {
InputStream in;
MyDataInputStream(InputStream in) {
this.in = in;
}
public final byte readByte() throws IOException {
int ch = in.read();
if (ch < 0)
throw new EOFException();
return (byte)(ch);
}
@Override
public void close() throws IOException {
in.close();
}
}
PS Update for thoose, who is in doubt about my results, this is printout, using -XX:+PrintCompilation -verbose:gc -XX:CICompilerCount=1
PS更新为thoose,谁对我的结果有疑问,这是打印输出,使用
-XX:+PrintCompilation -verbose:gc -XX:CICompilerCount=1
37 1 java.lang.String::hashCode (55 bytes)
41 2 java.lang.String::charAt (29 bytes)
43 3 java.lang.String::indexOf (70 bytes)
49 4 java.lang.AbstractStringBuilder::ensureCapacityInternal (16 bytes)
52 5 java.lang.AbstractStringBuilder::append (29 bytes)
237 6 java.util.Random::nextInt (7 bytes)
237 9 n sun.misc.Unsafe::compareAndSwapLong (native)
238 7 java.util.concurrent.atomic.AtomicLong::get (5 bytes)
238 8 java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
239 10 java.util.Random::next (47 bytes)
239 11 % fias.TestArrays::testBytes1 @ 15 (77 bytes)
9645 11 % fias.TestArrays::testBytes1 @ -2 (77 bytes) made not entrant
9646 12 % fias.TestArrays::testLoop @ 10 (77 bytes)
9964 12 % fias.TestArrays::testLoop @ -2 (77 bytes) made not entrant
Loop: 318726397 -500090432
9965 13 java.io.DataInputStream::readByte (23 bytes)
9966 14 s java.io.ByteArrayInputStream::read (36 bytes)
9967 15 % ! fias.TestArrays::testDis @ 37 (279 bytes)
Dis: 2684374258 -500090432
12651 16 fias.TestArrays$MyDataInputStream::readByte (23 bytes)
12652 17 % ! fias.TestArrays::testMyDis @ 37 (279 bytes)
My Dis: 2675570541 -500090432
15327 18 fias.TestArrays::readByte (20 bytes)
15328 19 % ! fias.TestArrays::testViaMethod @ 23 (179 bytes)
Via method: 2367507141 -500090432
17694 20 fias.TestArrays::testLoop (77 bytes)
17699 21 % fias.TestArrays::testLoop @ 10 (77 bytes)
Loop: 374525891 -500090567
18069 22 ! fias.TestArrays::testDis (279 bytes)
Dis: 2674626125 -500090567
20745 23 ! fias.TestArrays::testMyDis (279 bytes)
My Dis: 2671418683 -500090567
23417 24 ! fias.TestArrays::testViaMethod (179 bytes)
Via method: 2359181776 -500090567
Loop: 315081855 -500090663
Dis: 2558738649 -500090663
My Dis: 2627056034 -500090663
Via method: 311692727 -500090663
Loop: 317813286 -500090778
Dis: 2565161726 -500090778
My Dis: 2630665760 -500090778
Via method: 314594434 -500090778
Loop: 313695660 -500090797
Dis: 2568251556 -500090797
My Dis: 2635236578 -500090797
Via method: 311882312 -500090797
Loop: 316781686 -500090929
Dis: 2563535623 -500090929
My Dis: 2638487613 -500090929
Via method: 313170789 -500090929
UPD-2 : Here is benchmark and results kindly given by @maaartinus. UPD-2 :这是@maaartinus友情提供的基准和结果 。
Surprisingly, reason is try-with-resources statement on MyDataInputStream
/ DataInputStream
令人惊讶的是,原因是
MyDataInputStream
/ DataInputStream
上的try-with-resources语句
if we move initialization inside try block performance will be like loop/method invocation 如果我们在try块中移动初始化,性能就像循环/方法调用一样
private void testMyDis(byte[] bytes) throws IOException {
final long time1 = System.nanoTime();
long c = 0;
try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes)) {
final MyDataInputStream dis = new MyDataInputStream(bais);
for (int i = 0; i < bytes.length; i++) {
c += dis.readByte();
}
}
final long time2 = System.nanoTime();
System.out.println("My Dis: \t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
}
I think that with that unnecessary resource JIT cannot use Range Check Elimination 我认为有了这个不必要的资源,JIT就无法使用Range Check Elimination
The answer has been in the test. 答案已经在测试中。 the extra cost is owe to function invocation.
额外的成本归功于函数调用。 commonly we encourage to write short and clean functions instead of long functions and consider function invocation has very low cost.
通常我们鼓励编写简短而干净的函数而不是长函数,并且考虑函数调用的成本非常低。 but the invocation cost is still larger than direct memory access.
但调用成本仍然大于直接内存访问。
in this case, for testloop, we can estimate a memory read costs ~3 ns ( includes integer operations, eg i++, c + ) for others, there's 2 additonal layers of function invocations.so each function calls is ~15 ns. 在这种情况下,对于testloop,我们可以估计内存读取成本〜3 ns(包括整数运算,例如i ++,c +),其他有2个附加的函数调用层。因此每个函数调用约为15 ns。 actuality we can say function call is very fast.
现实我们可以说函数调用非常快。
the only point is that there's 2 000 000 000 functions calls in each process, that's really a large number. 唯一的一点是每个进程中有2 000 000个函数调用,这真的是一个很大的数字。
there's another test case to prove the function call costs: do not use any stream, just read bytes with additional function calls: 还有另一个测试用例来证明函数调用成本:不使用任何流,只需使用附加函数调用读取字节:
add below function, 添加以下功能,
public final long getByte( long c, byte value, int dep ) {
if ( dep > 0 ) {
return getByte( c, value, dep - 1);
}
return c + value;
}
then invoke in testLoop like: 然后在testLoop中调用,如:
c = getByte( c, bytes[i], 2);
then the final cost increase to the same level: 然后将最终成本增加到同一水平:
Loop: 4044010718 -499870245 环路:4044010718 -499870245
Dis: 5182272442 -499870245 Dis:5182272442 -499870245
My Dis: 5228065271 -499870245 我的消息:5228065271 -499870245
Via method: 655108198 -499870245 通过方法:655108198 -499870245
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.