[英]Apache Storm issue with Dynamic redirection of tuples (baffling impact on end-to-end latency)
Below I include text explaining the issue I face in Storm. 在下面,我包含解释我在Storm中面临的问题的文本。 Any way, I know it is a long post (just a heads up) and any comment/indication is more than welcome.
无论如何,我知道这是一篇很长的文章(请多加注意),任何评论/指示都值得欢迎。 There goes the description:
描述如下:
I have installed Storm 0.9.4 and ZooKeeper 3.4.6 on a single server (2 sockets with Intel Xeon 8-core chips, 96 GB ram running CentOS) and I have setup a pseudo-distributed, single node Storm runtime. 我已经在一台服务器上安装了Storm 0.9.4和ZooKeeper 3.4.6(2个插槽,装有Intel Xeon 8核芯片,运行CentOS的96 GB内存),并且已经设置了一个伪分布式的单节点Storm运行时。 My configuration consists of 1 zookeeper server, 1 nimbus process, 1 supervisor process, and 1 worker process (when topologies are submitted), all running on the same machine.
我的配置包括1个Zookeeper服务器,1个nimbus进程,1个主管进程和1个工作进程(提交拓扑时),它们都在同一台计算机上运行。 The purpose of my experiment is to see Storm's behavior on a single node setting, when input load is dynamically distributed among executor threads.
我的实验的目的是查看输入负载在执行程序线程之间动态分配时,Storm在单节点设置上的行为。
For the purpose of my experiment I have input tuples that consist of 1 long and 1 integer value. 出于实验目的,我输入了由1个long和1个整数组成的元组。 The input data come from two spouts that read tuples from disk files and I control the input rates to follow the pattern: 200 tuples/second for the first 24 seconds (time 0 - 24 seconds) 800 tuples/second for the next 12 seconds (24 - 36 seconds) 200 tuples/sec for 6 more seconds (time 36 - 42 seconds) Turning to my topology, I have two types of bolts: a) a Dispatcher bolt that receives input from the two spouts, and (b) a Consumer bolt that performs an operation on the tuples and maintains some tuples as state.
输入数据来自两个从磁盘文件读取元组的喷嘴,我控制输入速率以遵循该模式:前24秒(时间0-24秒)为200元/秒,接下来的12秒为800元/秒( 24-36秒)200个元组/秒,持续6秒(时间36-42秒)。回到我的拓扑结构,我有两种类型的螺栓:a)一个分派器螺栓,它接收来自两个喷嘴的输入,以及(b)一个消费者螺栓,它在元组上执行操作并保持一些元组为状态。 The parallelism hint for the Dispatcher is one (1 executor/thread), since I have examined that it never reaches even 10% of its capacity.
Dispatcher的并行性提示是一个(1个执行程序/线程),因为我已经检查过它永远不会达到其容量的10%。 For the Consumer bolt I have a parallelism hint of two (2 executors/threads for that bolt).
对于Consumer螺栓,我有两个并行提示(该螺栓有2个执行程序/线程)。 The input rates I previously mentioned are picked so that I monitor end-to-end latency less than 10 msecs using the appropriate number of executors on the Consumer bolt.
选择了我前面提到的输入速率,以便我可以使用Consumer螺栓上的适当数量的执行器来监视不到10毫秒的端到端延迟。 In detail, I have run the same topology with one Consumer executor and it can handle an input rate of 200 tuples/sec with end-to-end latency < 10 msec.
详细地说,我使用一个Consumer执行程序运行了相同的拓扑,它可以处理200个元组/秒的输入速率,端到端延迟<10毫秒。 Similarly, if I add one more Consumer executor (2 executors in total) the topology can consume 800 tuples/sec with < 10 msecs end-to-end latency.
同样,如果我再添加一个消费者执行器(总共2个执行器),则拓扑可以消耗800个元组/秒,端到端延迟小于10毫秒。 At this point, I have to say that if I use 1 consumer executor for 800 tuples/sec the end-to-end latency reaches up to 2 seconds.
在这一点上,我不得不说,如果我使用1个消费者执行器以800个元组/秒的速度运行,则端到端延迟将达到2秒。 By the way, I have to mention that I measure end-to-end latency using the ack() function of my bolts and see how much time it takes between sending a tuple in the topology, until its tuple tree is fully acknowledged.
顺便说一句,我不得不提到,我使用螺栓的ack()函数来测量端到端的延迟,并查看从发送拓扑中的元组到完全确认其元组树之间需要花费多少时间。
As you realize by now, the goal is to see if I can maintain end-to-end latency < 10 msec for the input spike, by simulating the addition of another Consumer executor.In order to simulate the addition of processing resources for the input spike, I use direct grouping and before the spike, I send tuples only to one of the two Consumer executors. 到现在为止,您的目标是通过模拟另一个Consumer执行器的添加来查看我是否可以将输入尖峰的端到端延迟保持在<10毫秒以内,以便模拟为输入添加处理资源峰值,我使用直接分组,在峰值之前,我仅将元组发送给两个消费者执行器之一。 When the spike is detected on the Dispatcher, it starts sending tuples to the other Consumer also, so that the input load is balanced between two threads.
当在Dispatcher上检测到峰值时,它也开始将元组发送到另一个Consumer,以便在两个线程之间平衡输入负载。 Hence, I expect that when I start sending tuples to the additional Consumer thread, the end-to-end latency will drop back to its acceptable value.
因此,我希望当我开始将元组发送到附加的Consumer线程时,端到端的延迟将降至其可接受的值。 However, the previous does not happen.
但是,以前没有发生。
In order to verify my hypothesis that two Consumer executors are able to maintain < 10 msec latency during a spike, I execute the same experiment, but this time, I send tuples to both executors (threads) for the whole lifetime of the experiment. 为了验证我的假设,即两个Consumer执行器能够在峰值期间保持小于10毫秒的延迟,我执行了相同的实验,但是这次,我在整个实验过程中将元组发送给两个执行器(线程)。 In this case, the end-to-end latency remains stable and in acceptable levels.
在这种情况下,端到端延迟保持稳定并处于可接受的水平。 So, I do not know what really happens in my simulation.
因此,我不知道模拟中真正发生了什么。 I can not really figure out what causes the deterioration of the end-to-end latency in the case where input load is re-directed to the additional Consumer executor.
在将输入负载重定向到附加的Consumer执行程序的情况下,我无法真正找出导致端到端延迟恶化的原因。
In order to figure out more about the mechanics of Storm, I did the same setup on a smaller machine and did some profiling. 为了进一步了解Storm的机制,我在较小的计算机上进行了相同的设置并进行了性能分析。 I saw that most of the time is spent in the BlockingWaitStrategy of the lmax disruptor and it dominates the CPU.
我看到大部分时间都花在了lmax干扰器的BlockingWaitStrategy中,它支配着CPU。 My actual processing function (in the Consumer bolt) takes only a fraction of the lmax BlockingWaitStrategy.
我的实际处理功能(在Consumer螺栓中)仅占lmax BlockingWaitStrategy的一小部分。 Hence, I think that it is an I/O issue between queues and not something that has to do with the processing of tuples in the Consumer.
因此,我认为这是队列之间的I / O问题,而不是与Consumer中元组的处理有关。
Any idea about what goes wrong and I get this radical/baffling behavior? 关于出什么问题的任何想法,我都会得到这种激进/令人困惑的行为?
Thank you. 谢谢。
First, thanks for the detailed and well formulated question! 首先,感谢您提出的详细而有条理的问题! There are multiple comments from my side (not sure if this is already an answer...):
我这边有很多评论(不确定是否已经是答案...):
Hope this helps. 希望这可以帮助。 If you have more questions and/or information I can refine my answer later on.
如果您还有其他问题和/或信息,我可以稍后再完善我的答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.