Why does the Disruptor hold lots of data when the producer is much faster than the consumer?

Question

I'm learning about the LMAX Disruptor and have a problem: When I have a very large ring buffer, like 1024, and my producer is much faster than my consumer, the ring buffer will hold lots of data, but will not publish the events until my application ends. Which means my application will lose lots of data (my application is not a daemon).

I've tried to slow down the rate of the producer, which works. But I can't use this approach in my application, it would reduce my application's performance greatly.

val ringBufferSize = 1024
val disruptor = new Disruptor[util.Map[String, Object]](new MessageEventFactory, ringBufferSize, new MessageThreadFactory, ProducerType.MULTI, new BlockingWaitStrategy)

      disruptor.handleEventsWith(new MessageEventHandler(batchSize, this))
      disruptor.setDefaultExceptionHandler(new MessageExceptionHandler)
      val ringBuffer = disruptor.start
      val producer = new MessageEventProducer(ringBuffer)
 part.foreach { row =>
//        Thread.sleep(2000)
        accm.add(1)
        producer.onData(row)

//        flush(row)
      }

I want to find a way to control the batch size of the disruptor by myself, and is there any method to consume the rest of the data held at the end of my application?

Answer 1

If you let your application end abruptly, your consumers will end abruptly, too, of course. There is no need to slow down the producer, you simply need to block your application from exiting until all consumers (ie event handlers) have finished working on the outstanding events.

The normal way to do this is to invoke Disruptor.shutdown() on the main thread, thus blocking the application from exiting until Disruptor.shutdown() has returned.

In your code snipplet above, you'd add that command before you exit the routine after the part.foreach statement, blocking until the routine returns normally. That would ensure that all events are properly handled to completion.

The Disruptor excels mainly in buffering (smoothing out) bursts of data coming from a single (extremely fast) or multiple (still pretty fast) producer threads, to feed that data to consumers which perform in a predictable manner, thus eliminating as much latency and overhead due to lock contention as possible. You may find that simply invoking the consumer code from within your lambda may yield better or similar results if your producers are in fact much faster than your consumers, unless you use advanced techniques such as batching or setting up the Disruptor to run multiple instances of the same consumer in parallel threads, which requires the event handler implementation to be modified though (see the Disruptor FAQ ).

In your example, it seems that all you try to accomplish is to feed an already available set of data (your "part" collection) into a single event handler (MessageEventHandler). In such a use case, you might be better of saying something like parts.stream().parallel().foreach(... messageEventHanler.onEvent(event) ...)

Why does the Disruptor hold lots of data when the producer is much faster than the consumer?

Question

1 answers

solution1
1 2020-08-16 21:37:45

Why does the Disruptor hold lots of data when the producer is much faster than the consumer?

Question

1 answers

solution1 1 2020-08-16 21:37:45

solution1
1 2020-08-16 21:37:45