简体   繁体   English

为什么Java序列化会占用这么多空间?

[英]Why does Java serialization take up so much space?

I tried serializing instances of Byte and Integer and was shocked by how much space they took up when they were received on the other end. 我尝试序列化Byte和Integer实例,并为另一端接收到它们占用了多少空间而感到震惊。 Why is it that it only takes 4 bytes to make an Integer, but it takes up over 10 times that many bytes upon serialization? 为什么制作一个整数只需要4个字节,而序列化却要占用那么多字节的10倍? I mean in C++, a final class has a 64 bit class identifier, plus its contents. 我的意思是在C ++中,最终类具有一个64位的类标识符及其内容。 Going off that logic, I would expect an Integer to take up 64 + 32, or 96 bits when serialized. 按照这种逻辑,我希望一个整数在序列化时占用64 + 32或96位。

import java.io.*;

public class Test {
    public static void main (String[] ar) throws Exception {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        ObjectOutput out = new ObjectOutputStream(bos);   
        out.writeObject(new Integer(32));
        byte[] yourBytes = bos.toByteArray();
        System.out.println("length: " + yourBytes.length + " bytes");
    }
}

Output: 输出:

length: 81 bytes 长度:81个字节

Update: 更新:

public static void main(String[] args) throws IOException {

    {
    ByteArrayOutputStream bos1 = new ByteArrayOutputStream();
    ObjectOutput out1 = new ObjectOutputStream(bos1);
    out1.writeObject(new Boolean(false));
    byte[] yourBytes = bos1.toByteArray();
    System.out.println("1 Boolean length: " + yourBytes.length);
    }

    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    ObjectOutput out = new ObjectOutputStream(bos);
    for (int i = 0; i < 1000; ++i) {
        out.writeObject(new Boolean(true)); // 47 bytes
    }
    byte[] yourBytes = bos.toByteArray();
    System.out.println("1000 Booleans length: " + yourBytes.length); // 7040 bytes

    final int count = 1000;

    ArrayList<Boolean> listBoolean = new ArrayList<>(count);
    listBoolean.addAll(Collections.nCopies(count, Boolean.TRUE));
    System.out.printf("ArrayList: %d%n", sizeOf(listBoolean)); // 5096 bytes

    Boolean[] arrayBoolean = new Boolean[count];
    Arrays.fill(arrayBoolean, true);
    System.out.printf("Boolean[]: %d%n", sizeOf(arrayBoolean)); // 5083 bytes

    boolean[] array = new boolean[count];
    Arrays.fill(array, true);
    System.out.printf("boolean[]: %d%n", sizeOf(array)); // 1027 bytes

    BitSet bits = new BitSet(count);
    bits.set(0, count);
    System.out.printf("BitSet: %d%n", sizeOf(bits)); // 201 bytes
}

static int sizeOf(Serializable obj) throws IOException {
    ByteArrayOutputStream bytesOut = new ByteArrayOutputStream();
    ObjectOutputStream objsOut = new ObjectOutputStream(bytesOut);
    objsOut.writeObject(obj);
    return bytesOut.toByteArray().length;
}

Output: 输出:

1 Boolean length: 47 (47 bytes per Boolean) 1个布尔长度:47(每个布尔47字节)

1000 Booleans length: 7040 (7 bytes per Boolean) 1000个布尔值长度:7040(每个布尔值7个字节)

ArrayList: 5096 (5 bytes per Boolean) ArrayList:5096(每个布尔值5个字节)

Boolean[]: 5083 (5 bytes per Boolean) Boolean []:5083(每个布尔值5个字节)

boolean[]: 1027 (1 bytes per boolean) boolean []:1027(每个布尔值1个字节)

BitSet: 201 (1/5 of 1 byte per boolean) 位集:201(每个布尔值1字节的1/5)

Though Radiodef has clarified why the size of the serialized object is huge, i would like to make another point here so we don't forget the optimization present in the underlying java's serialization algorithm (almost in all algorithms). 尽管Radiodef阐明了为什么序列化对象的大小很大,但是我想在这里再说一点,这样我们就不会忘记底层Java序列化算法(几乎所有算法)中存在的优化。

When you write another Integer object (or any object which is already written), you would not see similar size (i mean the size would not be 81 * 2 = 162 bytes) in this case, 当您写入另一个Integer对象(或任何已写入的对象)时,在这种情况下,您不会看到相似的大小(我的意思是大小不会是81 * 2 = 162字节),

ObjectOutput out = new ObjectOutputStream(bos);   
out.writeObject(new Integer(32));
out.writeObject(new Integer(65));
byte[] yourBytes = bos.toByteArray();
System.out.println("length: " + yourBytes.length + " bytes");

The way it works is that, when an instance (object) of class is requested for serialization for the first time, it writes the information about the whole class. 它的工作方式是,当首次请求类的实例(对象)进行序列化时,它将写入有关整个类的信息 ie including class name, it writes the name of each fields present in the class . 即包括类名,它会写出类中存在的每个字段的名称 That's why the number of bytes are more. 这就是为什么字节数更多的原因。 This is basically to handle the class evaluation cases properly. 这基本上是为了正确处理班级评估案例。

While it sends the meta data of the class for first time, it also caches the same information into the local cache called value-cache or indirection table. 当它第一次发送该类的元数据时,它还将相同的信息缓存到称为值缓存或间接表的本地缓存中。 So next time when another instance of same class is requested for serialization (remember the cache is applicable only at stream level, or before reset() is called), it just writes only a marker (just 4 bytes of information) so that the size would be less. 因此,下一次当请求同一类的另一个实例进行序列化时(请记住,高速缓存仅适用于流级别,或者在调用reset()之前适用),它将仅写入一个标记(仅4个字节的信息),因此大小会更少。

java.lang.Byte and java.lang.Integer are objects, so at the very least the qualified names of their classes need to also be stored for them to be deserialized. java.lang.Bytejava.lang.Integer是对象,因此至少还需要存储其类的合格名称,以便对其进行反序列化。 Also the serialVersionUID needs to be stored, etc. We can easily see how this extra information inflates the size quickly. 另外,还需要存储serialVersionUID ,等等。我们可以轻松地看到这些额外的信息如何快速增大大小。

If you want to learn about the serialization format, there is an article about it at JavaWorld: http://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html . 如果您想了解序列化格式,请在JavaWorld上找到有关它的文章: http : //www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html


If you're concerned about the size of serialized data, pick a format which is more compact: 如果您担心序列化数据的大小,请选择一种更紧凑的格式:

import java.util.*;
import java.io.*;

class Example {
    public static void main(String[] args) throws IOException {
        final int count = 1000;

        ArrayList<Boolean> list = new ArrayList<>(count);
        list.addAll(Collections.nCopies(count, Boolean.TRUE));
        System.out.printf("ArrayList: %d%n", sizeOf(list));

        boolean[] array = new boolean[count];
        Arrays.fill(array, true);
        System.out.printf("boolean[]: %d%n", sizeOf(array));

        BitSet bits = new BitSet(count);
        bits.set(0, count);
        System.out.printf("BitSet: %d%n", sizeOf(bits));
    }

    static int sizeOf(Serializable obj) throws IOException {
        ByteArrayOutputStream bytesOut = new ByteArrayOutputStream();
        ObjectOutputStream objsOut = new ObjectOutputStream(bytesOut);
        objsOut.writeObject(obj);
        return bytesOut.toByteArray().length;
    }
}
ArrayList: 5096
boolean[]: 1027
BitSet: 201

Example on Ideone . Ideone的示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM