简体   繁体   中英

Apache Storm issue with Dynamic redirection of tuples (baffling impact on end-to-end latency)

Below I include text explaining the issue I face in Storm. Any way, I know it is a long post (just a heads up) and any comment/indication is more than welcome. There goes the description:

I have installed Storm 0.9.4 and ZooKeeper 3.4.6 on a single server (2 sockets with Intel Xeon 8-core chips, 96 GB ram running CentOS) and I have setup a pseudo-distributed, single node Storm runtime. My configuration consists of 1 zookeeper server, 1 nimbus process, 1 supervisor process, and 1 worker process (when topologies are submitted), all running on the same machine. The purpose of my experiment is to see Storm's behavior on a single node setting, when input load is dynamically distributed among executor threads.

For the purpose of my experiment I have input tuples that consist of 1 long and 1 integer value. The input data come from two spouts that read tuples from disk files and I control the input rates to follow the pattern: 200 tuples/second for the first 24 seconds (time 0 - 24 seconds) 800 tuples/second for the next 12 seconds (24 - 36 seconds) 200 tuples/sec for 6 more seconds (time 36 - 42 seconds) Turning to my topology, I have two types of bolts: a) a Dispatcher bolt that receives input from the two spouts, and (b) a Consumer bolt that performs an operation on the tuples and maintains some tuples as state. The parallelism hint for the Dispatcher is one (1 executor/thread), since I have examined that it never reaches even 10% of its capacity. For the Consumer bolt I have a parallelism hint of two (2 executors/threads for that bolt). The input rates I previously mentioned are picked so that I monitor end-to-end latency less than 10 msecs using the appropriate number of executors on the Consumer bolt. In detail, I have run the same topology with one Consumer executor and it can handle an input rate of 200 tuples/sec with end-to-end latency < 10 msec. Similarly, if I add one more Consumer executor (2 executors in total) the topology can consume 800 tuples/sec with < 10 msecs end-to-end latency. At this point, I have to say that if I use 1 consumer executor for 800 tuples/sec the end-to-end latency reaches up to 2 seconds. By the way, I have to mention that I measure end-to-end latency using the ack() function of my bolts and see how much time it takes between sending a tuple in the topology, until its tuple tree is fully acknowledged.

As you realize by now, the goal is to see if I can maintain end-to-end latency < 10 msec for the input spike, by simulating the addition of another Consumer executor.In order to simulate the addition of processing resources for the input spike, I use direct grouping and before the spike, I send tuples only to one of the two Consumer executors. When the spike is detected on the Dispatcher, it starts sending tuples to the other Consumer also, so that the input load is balanced between two threads. Hence, I expect that when I start sending tuples to the additional Consumer thread, the end-to-end latency will drop back to its acceptable value. However, the previous does not happen.

In order to verify my hypothesis that two Consumer executors are able to maintain < 10 msec latency during a spike, I execute the same experiment, but this time, I send tuples to both executors (threads) for the whole lifetime of the experiment. In this case, the end-to-end latency remains stable and in acceptable levels. So, I do not know what really happens in my simulation. I can not really figure out what causes the deterioration of the end-to-end latency in the case where input load is re-directed to the additional Consumer executor.

In order to figure out more about the mechanics of Storm, I did the same setup on a smaller machine and did some profiling. I saw that most of the time is spent in the BlockingWaitStrategy of the lmax disruptor and it dominates the CPU. My actual processing function (in the Consumer bolt) takes only a fraction of the lmax BlockingWaitStrategy. Hence, I think that it is an I/O issue between queues and not something that has to do with the processing of tuples in the Consumer.

Any idea about what goes wrong and I get this radical/baffling behavior?

Thank you.

First, thanks for the detailed and well formulated question! There are multiple comments from my side (not sure if this is already an answer...):

  1. your experiment is rather short (time ranges below 1 minute) which I think might not reveal reliable numbers.
  2. How do you detect the spike?
  3. Are you awe of the internal buffer mechanisms in Storm (have a look here: http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/ )
  4. How many ackers did you configure?
  5. I assume that during your spike period, before you detect the spike, the buffers are filled up and it takes some time to empty them. Thus the latency does not drop immediately (maybe extending you last period resolve this).
  6. Using the ack mechanism is done by many people, however, it is rather imprecise. First, it shows an average value (a mean, quantile, or max would be much better to use. Furthermore, the measure value is not what should be considered the latency after all. For example, if you hold a tuple in an internal state for some time and do not ack it until the tuple is removed from the state, Storm's "latency" value would increase what does not make sense for a latency measurement. The usual definition of latency would be to take the output timestamp of an result tuple and subtract the emit timestamp the source tuple (if there a multiple source tuples, you use the youngest---ie, maximum---timestamp over all source tuples). The tricky part is to figure out the corresponding source tuples for each output tuple... As an alternative, some people inject dummy tuples that carry their emit timestamp as data. This dummy tuples are forwarded by each operator immediately and the sink operator can easily compete a latency value as it has access to the emit timestamp that is carried arou nd. This is a quite good approximation of the actual latency as described before.

Hope this helps. If you have more questions and/or information I can refine my answer later on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM