简体   繁体   English

如何获取表示Java对象的序列化字节数?

[英]How to get amount of serialized bytes representing a Java object?

What syntax would I use to get the number of bytes representing a string and compare them to the number of bytes representing an ArrayList holding that string, for example? 我将使用什么语法来获取表示字符串的字节数,并将它们与表示保存该字符串的ArrayList的字节数进行比较,例如?

I am using a multi-agent agent system to send objects via messages and I want to keep track of how much space each message takes up. 我正在使用多代理代理系统通过消息发送对象,我想跟踪每条消息占用多少空间。 The method doesn't have to be dead-on accurate, as long as it scales proportionally to the actual size of the object. 只要该方法与对象的实际大小成比例地缩放,该方法就不必是准确的。 Eg a Vector of strings of length 4 will report as smaller than a Vector of strings of length 5. 例如,长度为4的字符串向量将报告为小于长度为5的字符串向量。

You can convert your object into a byte array using ObjectOutputStream and ByteArrayOutputStream : 您可以使用ObjectOutputStreamByteArrayOutputStream将对象转换为字节数组:

public static int sizeof(Object obj) throws IOException {

    ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
    ObjectOutputStream objectOutputStream = new ObjectOutputStream(byteOutputStream);

    objectOutputStream.writeObject(obj);
    objectOutputStream.flush();
    objectOutputStream.close();

    return byteOutputStream.toByteArray().length;
}

I just tested this out. 我刚测试了这个。 The object who's size you're trying to calculate, needs to implement Serializable (which means you may have to mark every object as such simply to get its size. Might not be desirable). 你想要计算的大小的对象需要实现Serializable (这意味着你可能必须将每个对象标记为仅仅为了获得它的大小。可能不可取)。 I wrote a quick and dirty program to test this out: 我写了一个快速而又脏的程序来测试它:

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.Serializable;

public class Sizeof {

    public static class Person implements Serializable {
        private String name;
        private String age;

        public Person(String name, String age) {
            this.name = name;
            this.age = age;
        }

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public String getAge() {
            return age;
        }

        public void setAge(String age) {
            this.age = age;
        }
    }

    public static void main(String[] args) {
        Person p1 = new Person("Alby", "20");
        Person p2 = new Person("VeryLongName", "100");
        String s1 = "This is it";
        String s2 = "This";

        try {
            System.out.println("p1 " + sizeof(p1));
            System.out.println("p2 " + sizeof(p2));
            System.out.println("s1 " + sizeof(s1));
            System.out.println("s2 " + sizeof(s2));                                 
        }

        catch(Exception e) {
            e.printStackTrace();
        }
    }

    public static int sizeof(Object obj) throws IOException {

        ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
        ObjectOutputStream objectOutputStream = new ObjectOutputStream(byteOutputStream);

        objectOutputStream.writeObject(obj);
        objectOutputStream.flush();
        objectOutputStream.close();

        return byteOutputStream.toByteArray().length;
    }
}

Which gave me: 哪个给了我:

p1 85
p2 94
s1 17
s2 11

EDIT 编辑

Stephen C's answer highlights some caveats with this method. Stephen C的回答强调了这种方法的一些注意事项。

I needed to check this accurately per-memcache write while investigating a server bug where memcache sizes were exceeded. 在调查超出memcache大小的服务器错误时,我需要准确地检查每个memcache。 To avoid the overhead of a big byte array for large objects I extended OutputStream as a counter: 为了避免大对象的大字节数组的开销,我将OutputStream扩展为计数器:

public class CheckSerializedSize extends OutputStream {

    /** Serialize obj and count the bytes */
    public static long getSerializedSize(Serializable obj) {
        try {
            CheckSerializedSize counter = new CheckSerializedSize();
            ObjectOutputStream objectOutputStream = new ObjectOutputStream(counter);
            objectOutputStream.writeObject(obj);
            objectOutputStream.close();
            return counter.getNBytes();
        } catch (Exception e) {
            // Serialization failed
            return -1;
        }
    }

    private long nBytes = 0;

    private CheckSerializedSize() {}

    @Override
    public void write(int b) throws IOException {
        ++nBytes;
    }

    @Override
    public void write(byte[] b, int off, int len) throws IOException {
        nBytes += len;
    }

    public long getNBytes() {
        return nBytes;
    }
}

You can serialise each object into arrays and compare the length of each array. 您可以将每个对象序列化为数组并比较每个数组的长度。 This is not very accurate, in the general case, but often gives a good approximation. 在一般情况下,这不是非常准确,但通常给出了很好的近似值。

Have a look at ObjectOutputStream (which can be used to serialise an object and turn it into Bytes) and ByteArrayOutputStream (which can be used to hold the serialised bytes). 查看ObjectOutputStream(可用于序列化对象并将其转换为Bytes)和ByteArrayOutputStream(可用于保存序列化字节)。

I don't think you've got much choice but to modify your code so that it measures the message sizes at runtime. 我认为你没有太多选择,只能修改你的代码,以便它在运行时测量消息大小。

You could just serialize example objects and capture and measure the serialized size. 您可以序列化示例对象并捕获和测量序列化大小。 This has the following problems: 这有以下问题:

  • You can never be sure that the objects are typical. 您永远无法确定对象是否典型。
  • Various aggregation effects mean that it is hard to deduce the size of a message from the serialized size of its component objects. 各种聚合效果意味着很难从其组件对象的序列化大小推断出消息的大小。 (For instance, class signatures are only encoded once per serialization.) (例如,类签名每个序列化只编码一次。)
  • This approach tells you nothing about the relative frequency of different message types. 此方法不会告诉您有关不同消息类型的相对频率的信息。

If you can manage this, you will get more accurate results if you can measure the actual messages. 如果您可以对此进行管理,那么如果您可以测量实际消息,您将获得更准确的结果。 This would most likely entail modifying the agent framework to count, measure and (ideally) classify messages into different kinds . 这很可能需要修改代理框架以计算,测量和(理想地)将消息分类为不同类型 The framework might already have hooks for doing this. 框架可能已经有了钩子来做这件事。

The method doesn't have to be dead-on accurate, as long as it scales proportionally to the actual size of the object. 只要该方法与对象的实际大小成比例地缩放,该方法就不必是准确的。 Eg a Vector of strings of length 4 will report as larger than a Vector of strings of length 5. 例如,长度为4的字符串向量将报告为长度为5的字符串向量。

(I assume that you meant smaller than ...) (我认为你的意思是小于 ...)

Your example illustrates one of the problems of trying to estimate serialized object sizes. 您的示例说明了尝试估计序列化对象大小的问题之一。 A serialization of a Vector<String> of size 4 could be smaller ... or larger ... that a Vector<String> of size 5. It depends on what the String values are. 大小为4的Vector<String>的序列化可能更小......或者更大......一个大小为5的Vector<String> 。它取决于String值是什么。 Additionally, if a message contains two Vector<String> objects, the serialized size occupied by the vectors will be less that sum of the sizes of the two vectors when they are serialized separately. 此外,如果消息包含两个Vector<String>对象,则向量占用的序列化大小将小于单独序列化时两个向量的大小总和。

have a look at: http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html 看看: http//www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html

closest thing that comes to mind would be serializing it and reading the num of bytes 我想到的最接近的事情就是将它序列化并读取字节数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM