简体   繁体   English

Apache Storm:通过Source Spout到Final Bolt的唯一ID跟踪元组

[英]Apache Storm: Track tuples by unique ID from Source Spout to Final Bolt

I want a method of uniquely identifying tuples throughout a whole Storm topology, so that each tuple can be tracked from Spout to the final Bolt. 我想要一种在整个Storm拓扑中唯一识别元组的方法,这样每个元组都可以从Spout跟踪到最终的Bolt。

The way I understand it is when passing a unique message id with an emit from a spout for example: 我理解它的方式是从一个spout传递一个唯一的消息id,例如:

String msgID = UUID.randomUUID();
// emits a line from user tasks with msg id
outputCollector.emit(new Values(task), msgID);

This ID is somehow returned when acked to the Spout (Can this be simulated earlier to get back the passed Id at any point?). 这个ID以某种方式在回到Spout时返回(这可以在之前模拟以在任何时候返回传递的Id吗?)。 But the using of get message id on a tuple for example: 但是在元组上使用get message id例如:

inputTuple.getMessageId()

This returns a new messageId not the one passed in at the Spout that is generated by the Tuple. 这将返回一个新消息,而不是由元组生成的Spout传入的消息。 Reference https://groups.google.com/forum/#!topic/storm-user/xBEqMDa-RZs 参考https://groups.google.com/forum/#!topic/storm-user/xBEqMDa-RZs

Questions 问题

1) Is there a way to get the tuple.getMessageId() when the collector emits the Tuple. 1)当收集器发出元组时,有没有办法获取tuple.getMessageId()。

2) Alternatively can the passed in messageId at the spout be got somehow from the tuple at any spout or bolt in the toplogy? 2)或者可以在topout中的任何spout或bolt中从元组以某种方式获得在spout中传递的messageId?

End Solution I want to be able to set an ID on a tuple when it is emitted, and then be able to identify that tuple again at any point in the Storm topology. 结束解决方案我希望能够在元组发出时在元组上设置ID,然后能够在Storm拓扑中的任何位置再次识别该元组。

Or will the unique messageId that my system will track with have to be passed as a field/value on each output of each spout and bolt. 或者我的系统将跟踪的唯一messageId必须作为字段/值传递到每个喷口和螺栓的每个输出上。

Thanks 谢谢

无法在生产者处访问系统生成的ID(仅通过tuple.getMessageId()在消费者tuple.getMessageId() 。为了按照您的意愿跟踪元组,您需要(按照您自己的想法)添加ID作为元组的常规字段值,并将其在每个螺栓中复制到相应的输出元组。

Several parts to this answer. 这个答案的几个部分。 First, as you correctly point out, it's up to you to come up with a unique ID in your spout for each tuple you emit. 首先,正如您正确指出的那样,您可以为您发出的每个元组在您的喷口中找到一个唯一的ID。 Second, if you want to access that ID anywhere in your topology then add that ID to the composite Tuple emitted by the Spout. 其次,如果要在拓扑中的任何位置访问该ID,请将该ID添加到Spout发出的复合元组中。 Third (just for completeness), if there's anything in your emitted tuple that you'll need to know when handling an ack or a fail in your Spout then add that information as part of a composite value that makes up your message ID. 第三个(仅为完整性),如果在处理确认或者Spout中的故障时您需要知道发射元组中的任何内容,则将该信息作为构成消息ID的复合值的一部分添加。

Just as an example, I usually use the Tuple itself as the message ID, too, when emitting a tuple from a spout: 作为一个例子,当从spout发出一个元组时,我通常也会使用Tuple本身作为消息ID:

outputCollector.emit(myTuple, myTuple);

This might be overkill, but at least I have access to all of the information in the tuple everywhere. 这可能是矫枉过正,但至少我可以访问元组中的所有信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM