简体   繁体   English

Java中的部分反序列化和序列化?

[英]Partial deserialization and serialization in Java?

There are a huge number of libraries and approaches out there to serialize and de-serialize objects in Java. 有大量的库和方法可以在Java中对对象进行序列化和反序列化。 What I would like to do involves rather large and complex objects which need to get sent back and forth between processing nodes. 我想做的事情涉及相当大和复杂的对象,这些对象需要在处理节点之间来回发送。

However, each node only is interested in one or a few, usually small parts of the whole object. 但是,每个节点只对整个对象的一个​​或几个(通常是一小部分)感兴趣。 The processing node processes that part and creates a new part that would need to get spliced into the existing serialized object before it gets sent on. 处理节点处理该零件并创建一个新零件,在发送该零件之前,需要将其拼接到现有的序列化对象中。

For this, two things would be of high importance: 为此,两件事非常重要:

  • being able to just deserialize parts of the serialized object (and thus save parsing/deserialization time, object creation time, memory...) and to also add the serialization of some new part to the existing serialized object (again saving time and memory) -- skipping the unwanted parts in the serialized version should be extremely fast and efficient and should ideally be possible in a streaming mode, without the need to keep the whole serialized data in memory at once 能够仅反序列化序列化对象的部分(从而节省解析/反序列化时间,对象创建时间,内存...),并且还可以将一些新部件的序列化添加到现有的序列化对象中(再次节省时间和内存) -跳过序列化版本中不需要的部分应该非常快速和高效,并且理想情况下应该可以在流模式下进行,而无需立即将整个序列化数据保留在内存中
  • overall compact and fast serialization and deserialization. 整体紧凑,快速的序列化和反序列化。

I am pretty flexible as to how much automation I get for actually creating typed objects versus untyped maps and lists: if all else fails I would be able to represent the whole object as a nested data structure of just maps, arrays and the basic datatypes boolean, String and Number. 对于实际创建类型化的对象与未类型化的映射和列表所获得的自动化程度,我相当灵活:如果所有其他方法均失败,我将能够将整个对象表示为仅包含映射,数组和基本数据类型布尔值的嵌套数据结构,字符串和数字。

UPDATE: forgot to mention two additional, rather important requirements: 更新:忘记提及另外两个非常重要的要求:

  • the solution must be possible with the existing objects, ie it is not possible to re-implement the current object using a eg different collections class. 现有对象必须有可能的解决方案,即不可能使用例如不同的collections类来重新实现当前对象。

  • ideally the solution should be based on open-source software because the software I need this for will be published itself as open-source. 理想情况下,该解决方案应基于开源软件,因为我需要此软件的软件将以开源形式发布。

It sounds like you're planning a design where a whole bunch of data is sent to a processing node, and that node will only read/modify/write a small part of it. 听起来您正在计划一个设计,在该设计中将一堆数据发送到处理节点,而该节点将仅读取/修改/写入其中的一小部分。 But then will send the whole bundle on to another node. 但是随后会将整个捆绑发送到另一个节点。

Why not have the host that has all the data figure out which node needs which data, and only send that data? 为什么没有拥有所有数据的主机找出哪个节点需要哪些数据,而仅发送该数据? Then processing can happen in parallel, instead of daisy-chain. 然后,处理可以并行发生,而不是菊花链。 And your total network traffic will be less than every node sending a full copy of everything: O(n*m). 而且您的总网络流量将少于发送所有内容的完整副本的每个节点:O(n * m)。

It might be worth designing your own message format, potentially based on JSON, binary, or something else. 可能需要设计自己的消息格式,可能基于JSON,二进制或其他内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM