简体   繁体   English

如何在不实际序列化的情况下估计Java中对象的序列化大小?

[英]How to estimate the serialization size of objects in Java without actually serializing them?

To enhance messaging in a cluster, it's important to know at runtime about how big a message is (should I prefer processing local or remote). 要增强群集中的消息传递,重要的是要在运行时了解消息的大小(我应该更喜欢处理本地消息还是远程消息)。

I could just find frameworks about estimating the object memory size based on java instrumentation. 我可以找到基于java检测估计对象内存大小的框架。 I've tested classmexer, which didn't come close to the serialization size and sourceforge SizeOf. 我测试了classmexer,它没有接近序列化大小和sourceforge SizeOf。

In a small testcase, SizeOf was around 10% wrong and 10x faster than serialization. 在一个小的测试用例中,SizeOf错误大约10%,比序列化快10倍。 (Still transient breaks the estimation completely and since eg ArrayList is transient but is serialized as an Array, it's not easy to patch SizeOf. But I could live with that) (仍然瞬态完全破坏了估计,因为例如ArrayList是瞬态的,但是被序列化为数组,修补SizeOf并不容易。但我可以忍受这种情况)

On the other hand, 10x faster with 10% error doesn't seem very good. 另一方面,10%的误差和10%的误差似乎不太好。 Any ideas how I could do better? 任何想法我怎么能做得更好?

Update: I also tested ObjectSize ( http://sourceforge.net/projects/objectsize-java ). 更新:我还测试了ObjectSize( http://sourceforge.net/projects/objectsize-java )。 Results seem just good for non-inheritating objects :( 结果似乎只适合非继承对象:(

The size a class takes at runtime doesn't necessarily have any bearing on it's size in memory. 类在运行时获取的大小不一定与其在内存中的大小有关。 The example you've mentioned is transient fields. 你提到的例子是瞬态场。 Other examples include when objects implement Externalizable and handle serialization themselves. 其他示例包括对象何时实现Externalizable并自行处理序列化。

If an object implements Externalizable or provides readObject() / writeObject() then your best bet is to serialize the object to a memory buffer to find out the size. 如果一个对象实现Externalizable或提供readObject() / writeObject()那么最好的办法是将对象序列化到内存缓冲区以找出大小。 It's not going to be fast, but it will be accurate. 它不会很快,但它会准确。

If an object is using the default serialization, then you could amend SizeOf to take into account transient fields. 如果对象使用默认序列化,则可以修改SizeOf以考虑瞬态字段。

After serializing many of the same types of objects, you may be able to build up a "serialization profile" for that type that correlates serialized size with runtime size from SizeOf. 序列化许多相同类型的对象后,您可以为该类型构建“序列化配置文件”,将序列化大小与SizeOf的运行时大小相关联。 This will allow you then to estimate the serialized size quickly (using SizeOf) and then correlate this to runtime size, to arrive at a more accurate result than that provided by SizeOf. 这样您就可以快速估计序列化大小(使用SizeOf),然后将其与运行时大小相关联,以获得比SizeOf提供的结果更准确的结果。

There are many good points in the other answers, one thing that is lacking is that the serialization mechanism may cache certain objects . 其他答案中有许多好处,缺少的一点是序列化机制可能会缓存某些对象

For example you serialize a series of objects A, B, and C all of the same class that hold two objects o1 and o2 in each object. 例如,您序列化一系列对象A,B和C,这些对象在每个对象中包含两个对象o1和o2。 Let us say that the object overhead is 100 bytes and let us say the objects look like: 让我们说对象开销是100个字节,让我们说对象看起来像:

Object shared = new Object();
Object shread2 = new Object();

A.o1 = new Object()
A.o2 = shared


B.o1 = shared2
B.o2 = shared


C.o1 = shared2
C.o2 = shared

For simplicity sake we might say that the generic objects take 50 bytes to serialize and A's serialization size is 100 (overhead) + 50 (o1) + 50 (o2) = 200 bytes. 为简单起见,我们可以说通用对象需要50个字节来串行化,A的序列化大小为100(开销)+ 50(o1)+ 50(o2)= 200个字节。 One could make a similar naive estimation for B and C as well. 人们也可以对B和C做出类似的天真估计。 However if all three are serialized by the same object output stream before reset is called what you will see in the stream is a serialization of A and o1 and o2, Then a serialization of B and o1 for b, BUT a reference to o2 since it was the same object that was already serialzied . 但是,如果在调用reset之前所有三个都被相同的对象输出流序列化,那么你将在流中看到的是A和o1和o2的序列化,然后是b的序列化和o的b, 但是因为它引用了o2是已经序列化的同一个对象 So lets say an object reference takes 16 bytes the size of B is now 100 (overhead) + 50 (o1) + 16 (reference for o2) = 166. So the size that it takes to serialize has now changed! 所以假设一个对象引用需要16个字节,B的大小现在是100(开销)+ 50(o1)+ 16(o2的参考)= 166.所以序列化所需的大小现在已经改变了! We could do a simialr calculation for C and get 132 bytes with two objects cached, so the serialization size for all three objects is different with ~33% difference between the largest and smallest. 我们可以对C进行同步计算,并且缓存两个对象得到132个字节,因此所有三个对象的序列化大小不同,最大和最小之间的差异为~33%。

So unless you are serializing the entire object without a cache every time it is difficult to accurately estimate the size required to serialize the object. 因此,除非每次难以准确估计序列化对象所需的大小时序列化整个对象而没有缓存。

Just an idea - you could serialize the object to a byte buffer first, get its length and decide now whether to send the buffers content to a remote location or do the local processing (if it depends on the messages size). 只是一个想法 - 您可以首先将对象序列化为字节缓冲区,获取其长度并立即决定是将缓冲区内容发送到远程位置还是进行本地处理(如果它取决于消息大小)。

Drawback - you may waste time for serialization if later to decide not use the buffer. 缺点 - 如果稍后决定不使用缓冲区,您可能会浪费时间进行序列化。 But if you estimate you waste estimation effort in case you need to serialize (because in this case you estimate first and serialize in a second step). 但是如果你估计你需要序列化就浪费估计工作量(因为在这种情况下你首先估计并在第二步中序列化)。

There can be no way to estimate the serialized size of the object with nice precision and speed. 无法以精确的速度和速度来估计对象的序列化大小。 For example some object could be a cache of Pi number digits that constructs itself during runtime given only the length you need. 例如,一些对象可以是Pi数字的缓存,它们在运行时仅根据您需要的长度构造自身。 So it will serialize only 4 bytes of the 'length' attribute, while the object could be using hundreds of megabytes of memory to store that Pi number. 因此,它将仅序列化'length'属性的4个字节,而对象可能使用数百兆字节的内存来存储该Pi数。

The only solution I can think of is adding your own interface, having method int estimateSerializeSize() . 我能想到的唯一解决方案是添加自己的接口,使用方法int estimateSerializeSize() For every object implementing this interface you would need to call this method to get the proper size. 对于实现此接口的每个对象,您需要调用此方法以获得正确的大小。 If some Object does not implement it - you would have to use SizeOf. 如果某些Object没有实现它 - 您将不得不使用SizeOf。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM