简体   繁体   English

如何设计包含大量数据的JMS消息

[英]How to design a JMS message containing large amounts of data

I am working on designing a system that uses an ETL tool to retrieve batches of data, ie, insert/update/deletes for one or more tables, and puts them on a JMS topic to be processed later by multiple clients. 我正在设计一个系统,该系统使用ETL工具检索一批数据,即为一个或多个表进行插入/更新/删除,并将它们放在JMS主题上,以便稍后由多个客户端处理。 Right now, each message on the topic represents a single record I/U/D and we have a special message to delimit the end of the batch. 现在,关于该主题的每条消息都代表一个记录I / U / D,我们有一条特殊的消息来分隔批处理的结尾。 It's important to process the batches in a single transaction, so having a bunch of messages delimited by a special one is not ideal: both sessions publishing and receiving messages must be designed for multiple messages; 在单个事务中处理批处理很重要,因此使一堆消息用特殊的消息分隔是不理想的:发布和接收消息的会话都必须针对多条消息进行设计; the batch delimiter message is a messy solution (each time we receive a message we need to check if it's the last) and very error prone; 批量定界符消息是一个混乱的解决方案(每次我们收到一条消息,我们都需要检查它是否是最后一条消息),并且容易出错; the system is difficult to debug and maintain; 系统难以调试和维护; the number of messages on the topic becomes quickly huge (up to millions). 关于该主题的消息数量迅速增加(多达数百万)。

Now, I think that the next natural step to improve the architecture is to pack all the records in a single JMS message so that when a message is received, it encompasses a single transaction, it's easy to detect failures, there are no "orphan" records on the topic, etc. I only see advantages in doing so! 现在,我认为改进体系结构的下一个自然步骤是将所有记录打包在一条JMS消息中,这样,当收到一条消息时,它包含一个事务,很容易检测到故障,没有“孤立的”记录有关主题等信息。我只会看到这样做的好处! Now here are my questions: 现在这是我的问题:

  • What's the best way to create such a packed message? 创建此类打包消息的最佳方法是什么? I think my choices are StreamMessage , ByteMessage or ObjectMessage . 我认为我的选择是StreamMessageByteMessageObjectMessage I excluded text and map messages because the first will require text parsing, which will kill performance, and I assume the second one doesn't really seem to fit the scenario. 我排除了文本和地图消息,因为第一个消息需要文本解析,这会降低性能,并且我认为第二个消息似乎并不适合这种情况。 I'm kinda leaning towards StreamMessage because it seems quite compact although it will require a lot of work writing custom serialization code (even worse for ByteMessage). 我有点倾向于StreamMessage因为它看起来很紧凑,尽管编写自定义序列化代码需要很多工作(对于ByteMessage来说更糟)。 Not sure about ObjectMessage, how does it perform? 不确定ObjectMessage,它如何执行? Is there an out of the box solution for this? 有没有现成的解决方案吗?
  • What's the maximum size allowed per message? 每封邮件允许的最大大小是多少? Could it be in the order of hundreds of KB or even few MB? 大小可能是数百KB甚至几MB?

Thanks for the thoughts! 感谢您的想法!

Giovanni 乔万尼

Instead of using one large message, you could use two (or more) queues, correlation ids and a message selector. 您可以使用两个(或更多)队列,相关性ID和消息选择器来代替使用一条大消息。

Queueing: 排队:

  1. Post a notification message to "notification queue" to indicate that processing should start 将通知消息发布到“通知队列”以指示应开始处理
  2. Post command messages to "command queue" with correlation id set to notification messages message id (you can use multiple command queues, if queue depth gets too high) 将命令消息发布到“命令队列”,并将相关性ID设置为通知消息消息ID(如果队列深度过高,则可以使用多个命令队列)
  3. Commit the transaction 提交交易

Processing: 处理方式:

  1. Receive the notification message from "notification queue" (eg with message driven bean) 从“通知队列”接收通知消息(例如,使用消息驱动的bean)
  2. Receive and process all the related messages from "command queue" using a message selector 使用消息选择器从“命令队列”接收并处理所有相关消息
  3. Commit the transaction 提交交易

Using bytes (eg a ByteMessage) is likely the less memory intensive. 使用字节(例如ByteMessage)可能需要较少的内存。

If you manipulate Java Objects, you can use a fast and byte effective serialization/deserialization library like Kryo 如果您操作Java对象,则可以使用快速且字节有效的序列化/反序列化库,例如Kryo

We happily use Kryo in production on a messaging system, but you have plenty of alternatives such as the popular Google Protocol Buffers 我们很高兴在邮件系统的生产中使用Kryo,但是您有许多替代选择,例如流行的Google Protocol Buffers

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM