[英]First attempt of serialization is slow in Java?
Consider a simple program (posted below) that serializes the given number of objects using `ObjectOutputStream'. 考虑一个简单的程序(在下面发布),该程序使用“ ObjectOutputStream”序列化给定数量的对象。 It calls the same function many times to serialize objects to a file. 它多次调用同一函数以将对象序列化到文件。 The first call takes longer than subsequent calls (the difference depends on the number of objects being serialized): 第一次调用比后续调用花费更长的时间(差异取决于要序列化的对象数):
Serializing 10000 objects...
Time elapsed: 498ms
Time elapsed: 168ms
Time elapsed: 186ms
Serializing 100000 objects...
Time elapsed: 1815ms
Time elapsed: 1352ms
Time elapsed: 1338ms
Serializing 500000 objects...
Time elapsed: 8341ms
Time elapsed: 7247ms
Time elapsed: 7051ms
What is the reason for this difference? 造成这种差异的原因是什么? I tried to do the same thing without serialization, ie writing a byte array, and there is no such difference. 我试图在没有序列化的情况下做同样的事情,即写一个字节数组,并且没有这种区别。
Update: the same thing happens if the program does not call the same method many times but serializes object in a for loop and then calls the method: the subsequent method call is faster: 更新:如果程序没有多次调用同一方法,而是在for循环中序列化对象然后调用该方法,则会发生相同的事情:后续方法调用速度更快:
"manual" serialization, time elapsed: 535
Time elapsed: 170ms
Time elapsed: 193ms
Time elapsed: 139ms
So JIT compilation cannot cause that difference. 因此,JIT编译不会造成这种差异。
Code: 码:
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
public class SerializationTest {
static final int COUNT = 10000, TRIES = 3;
static class Simple implements Serializable {
String name;
int index;
Simple(String name, int index) {
this.name = name;
this.index = index;
}
}
public static void main(String[] args) throws IOException {
int count = COUNT;
if (args.length > 0) {
count = Integer.parseInt(args[0]);
}
List<Simple> objects = new ArrayList<Simple>();
for (int i = 0; i < count; i++) {
objects.add(new Simple("simple" + i, i));
}
String filename = args.length > 1 ? args[1] : "objects";
System.err.println("Serializing " + count + " objects...");
for(int i = 0; i < TRIES; i++) {
System.err.println("Time elapsed: " +
serializeOneByOne(objects, filename + i + ".bin", false) + "ms");
}
}
static long serializeOneByOne(List<?> objects, String filename, boolean buffered)
throws IOException {
OutputStream underlying = new FileOutputStream(filename);
if (buffered) {
underlying = new BufferedOutputStream(underlying);
}
ObjectOutputStream output = new ObjectOutputStream(underlying);
// take started after the output stream is open
// although it does not make a big difference
long started = System.currentTimeMillis();
try {
for (Object s : objects) {
output.writeObject(s);
}
} finally {
output.close();
}
long ended = System.currentTimeMillis();
return ended - started;
}
}
The complete answer is that: 完整的答案是:
ObjectOutputStream has some internal static caches for several types of object is being serialized, (see ObjectStreamClass ) so subsequent serializations of objects of the same type are faster than the first one. ObjectOutputStream具有一些内部静态缓存,用于对几种类型的对象进行序列化(请参见ObjectStreamClass ),因此,相同类型的对象的后续序列化比第一种更快。
JIT compilation may impact the performance if considering the compilation of ObjectOutputStream.writeObject
(and not the user-defined method as mentioned in another answers). 如果考虑编译ObjectOutputStream.writeObject
(而不是另一个答案中提到的用户定义方法),则JIT编译可能会影响性能。 Thanks to all who mentioned JIT compilation in their answers. 感谢所有在回答中提到JIT编译的人。
These also explains why there is no difference when writing a byte array instead of serializing objects: a) no static caches and b) FileOutputStream.write(byte [])
calls the native writeBytes
and almost no JIT compilation takes place. 这些也解释了为什么在写字节数组而不是序列化对象时没有什么区别:a)没有静态缓存,b) FileOutputStream.write(byte [])
调用本地的writeBytes
,几乎没有JIT编译发生。
In Java the JIT (Just in Time compiler) compiles when a method it called often (some recommedn calling it 10.000 times). 在Java中,JIT(及时编译器)会在其经常调用的方法(有人建议将其调用10.000次)时进行编译。
But java Serialisation is know to be slow and uses a huge amount of memory. 但是众所周知,Java序列化速度很慢,并且会占用大量内存。
You can do better when you serialize yourself using a DataOutputStream. 使用DataOutputStream序列化自己时,可以做得更好。
java built in Serialisation if for fast demo projects, that works bug free right out of the box. 如果是用于快速演示项目的内置Java序列化程序,则可以立即使用,且无bug。
JVM maintains a call count
for each method in your program.Each time you call the same method in a program its call count
increases. JVM会为程序中的每个方法维护一个call count
。每次在程序中调用同一方法时,其call count
增加。 As soon as its call count
reaches to JIT compilation threshold
, this method is compiled
by JIT
. 一旦其call count
达到JIT compilation threshold
,此方法就会由JIT
compiled
。 And next time this method is called , its execution is faster because now instead of interpreting the method interpreter is executing the native code . 下次调用此方法时,它的执行速度更快,因为现在解释器正在执行本机代码,而不是对方法进行解释。 Hence the First call of same method takes more time than the subsequent calls. 因此,相同方法的第一次调用比后续调用花费更多的时间。
首次运行时,会产生大量成本,包括JIT编译,类加载,反射等。这是正常现象,并且大多数时候都不必担心,因为对生产应用程序的影响可以忽略不计。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.