简体   繁体   English

如何在 Storm 中测量每个元组的端到端延迟?

[英]How can I measure each tuple's end-to-end latency in storm?

I'm currently measuring a new grouping method in storm, thus throughput and latency matters the most, however I met with some difficulty when measuring the end-to-end latency of each tuple.我目前正在 Storm 中测量一种新的分组方法,因此吞吐量和延迟最重要,但是在测量每个元组的端到端延迟时遇到了一些困难。

I tried to timestamp inside the tuple and calculate the latency when I received it in the downstream of my topology, but there exists negative numbers in the results.我尝试在元组内打上时间戳并计算在拓扑下游收到它时的延迟,但结果中存在负数。

Because I'm running the topology in cluster mode, time cannot be precisely synced among the machines in the cluster(i tried NTP but it's not precise enough either), which may be the cause of the issue.因为我在集群模式下运行拓扑,所以无法在集群中的机器之间精确同步时间(我尝试了NTP但它也不够精确),这可能是问题的原因。

So does Storm itself provide some kind of method to measure the end-to-end latency for each tuple?那么 Storm 本身是否提供了某种方法来测量每个元组的端到端延迟? Or any trick I can use to achieve the purpose?或者我可以用来达到目的的任何技巧?

Finally got the solution to this!终于有办法解决这个问题了!

We implemented our own spout and implemented the IRichSpout interface, in our nextTuple() method we assigned an unique ID to the tuple and put the timestamp into a ConcurrentHashMap , then emit with the ID by doing like我们实现了自己的 spout 并实现了 IRichSpout 接口,在我们的nextTuple()方法中,我们为元组分配了一个唯一的 ID 并将时间戳放入ConcurrentHashMap ,然后通过执行类似的操作与 ID 一起发出

collector.emit(new Values(str), ID);

making storm aware that the tuple is assigned with an ID, thus in our ack(Object msgId) method, we check the time in hashmap by giving the key msgId , compare it with the current time, we got the latency for this tuple!让 Storm 意识到元组被分配了一个 ID,因此在我们的ack(Object msgId)方法中,我们通过提供键msgId检查 hashmap 中的时间,将其与当前时间进行比较,我们得到了这个元组的延迟!

As we set the spout parallelism to 1 we don't have to worry the time syncing problem anymore.当我们将 spout 并行度设置为 1 时,我们不必再担心时间同步问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM