简体   繁体   中英

Why does Java serialization take up so much space?

I tried serializing instances of Byte and Integer and was shocked by how much space they took up when they were received on the other end. Why is it that it only takes 4 bytes to make an Integer, but it takes up over 10 times that many bytes upon serialization? I mean in C++, a final class has a 64 bit class identifier, plus its contents. Going off that logic, I would expect an Integer to take up 64 + 32, or 96 bits when serialized.

import java.io.*;

public class Test {
    public static void main (String[] ar) throws Exception {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        ObjectOutput out = new ObjectOutputStream(bos);   
        out.writeObject(new Integer(32));
        byte[] yourBytes = bos.toByteArray();
        System.out.println("length: " + yourBytes.length + " bytes");
    }
}

Output:

length: 81 bytes

Update:

public static void main(String[] args) throws IOException {

    {
    ByteArrayOutputStream bos1 = new ByteArrayOutputStream();
    ObjectOutput out1 = new ObjectOutputStream(bos1);
    out1.writeObject(new Boolean(false));
    byte[] yourBytes = bos1.toByteArray();
    System.out.println("1 Boolean length: " + yourBytes.length);
    }

    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    ObjectOutput out = new ObjectOutputStream(bos);
    for (int i = 0; i < 1000; ++i) {
        out.writeObject(new Boolean(true)); // 47 bytes
    }
    byte[] yourBytes = bos.toByteArray();
    System.out.println("1000 Booleans length: " + yourBytes.length); // 7040 bytes

    final int count = 1000;

    ArrayList<Boolean> listBoolean = new ArrayList<>(count);
    listBoolean.addAll(Collections.nCopies(count, Boolean.TRUE));
    System.out.printf("ArrayList: %d%n", sizeOf(listBoolean)); // 5096 bytes

    Boolean[] arrayBoolean = new Boolean[count];
    Arrays.fill(arrayBoolean, true);
    System.out.printf("Boolean[]: %d%n", sizeOf(arrayBoolean)); // 5083 bytes

    boolean[] array = new boolean[count];
    Arrays.fill(array, true);
    System.out.printf("boolean[]: %d%n", sizeOf(array)); // 1027 bytes

    BitSet bits = new BitSet(count);
    bits.set(0, count);
    System.out.printf("BitSet: %d%n", sizeOf(bits)); // 201 bytes
}

static int sizeOf(Serializable obj) throws IOException {
    ByteArrayOutputStream bytesOut = new ByteArrayOutputStream();
    ObjectOutputStream objsOut = new ObjectOutputStream(bytesOut);
    objsOut.writeObject(obj);
    return bytesOut.toByteArray().length;
}

Output:

1 Boolean length: 47 (47 bytes per Boolean)

1000 Booleans length: 7040 (7 bytes per Boolean)

ArrayList: 5096 (5 bytes per Boolean)

Boolean[]: 5083 (5 bytes per Boolean)

boolean[]: 1027 (1 bytes per boolean)

BitSet: 201 (1/5 of 1 byte per boolean)

Though Radiodef has clarified why the size of the serialized object is huge, i would like to make another point here so we don't forget the optimization present in the underlying java's serialization algorithm (almost in all algorithms).

When you write another Integer object (or any object which is already written), you would not see similar size (i mean the size would not be 81 * 2 = 162 bytes) in this case,

ObjectOutput out = new ObjectOutputStream(bos);   
out.writeObject(new Integer(32));
out.writeObject(new Integer(65));
byte[] yourBytes = bos.toByteArray();
System.out.println("length: " + yourBytes.length + " bytes");

The way it works is that, when an instance (object) of class is requested for serialization for the first time, it writes the information about the whole class. ie including class name, it writes the name of each fields present in the class . That's why the number of bytes are more. This is basically to handle the class evaluation cases properly.

While it sends the meta data of the class for first time, it also caches the same information into the local cache called value-cache or indirection table. So next time when another instance of same class is requested for serialization (remember the cache is applicable only at stream level, or before reset() is called), it just writes only a marker (just 4 bytes of information) so that the size would be less.

java.lang.Byte and java.lang.Integer are objects, so at the very least the qualified names of their classes need to also be stored for them to be deserialized. Also the serialVersionUID needs to be stored, etc. We can easily see how this extra information inflates the size quickly.

If you want to learn about the serialization format, there is an article about it at JavaWorld: http://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html .


If you're concerned about the size of serialized data, pick a format which is more compact:

import java.util.*;
import java.io.*;

class Example {
    public static void main(String[] args) throws IOException {
        final int count = 1000;

        ArrayList<Boolean> list = new ArrayList<>(count);
        list.addAll(Collections.nCopies(count, Boolean.TRUE));
        System.out.printf("ArrayList: %d%n", sizeOf(list));

        boolean[] array = new boolean[count];
        Arrays.fill(array, true);
        System.out.printf("boolean[]: %d%n", sizeOf(array));

        BitSet bits = new BitSet(count);
        bits.set(0, count);
        System.out.printf("BitSet: %d%n", sizeOf(bits));
    }

    static int sizeOf(Serializable obj) throws IOException {
        ByteArrayOutputStream bytesOut = new ByteArrayOutputStream();
        ObjectOutputStream objsOut = new ObjectOutputStream(bytesOut);
        objsOut.writeObject(obj);
        return bytesOut.toByteArray().length;
    }
}
ArrayList: 5096
boolean[]: 1027
BitSet: 201

Example on Ideone .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM