简体   繁体   English

java中使用lmax Disruptor(3.0)处理百万文档

[英]Using lmax Disruptor (3.0) in java to process millions of documents

I have the following use-case:我有以下用例:

When my service starts, it may need to deal with millions of documents in as short of a burst as possible.当我的服务启动时,它可能需要在尽可能短的时间内处理数百万份文档。 There will be three sources of data.将有三个数据源。

I have set up the following:我已经设置了以下内容:

    /* batchSize = 100, bufferSize = 2^30
    public MyDisruptor(@NonNull final MyDisruptorConfig config) {
        batchSize = config.getBatchSize();
        bufferSize = config.getBufferSize();
        this.eventHandler = config.getEventHandler();
        ThreadFactory threadFactory = createThreadFactory("disruptor-threads-%d");
        executorService = Executors.newSingleThreadExecutor(threadFactory);
        ringBuffer = RingBuffer.createMultiProducer(new EventFactory(), bufferSize, new YieldingWaitStrategy());
        sequenceBarrier = ringBuffer.newBarrier();
        batchEventProcessor = new BatchEventProcessor<>(ringBuffer, sequenceBarrier, eventHandler);
        ringBuffer.addGatingSequences(batchEventProcessor.getSequence());
        executorService.submit(batchEventProcessor);
    }

    public void consume(@NonNull final List<Document> documents) {
        List<List<Document>> subLists = Lists.partition(documents, batchSize);
        for (List<Document> subList : subLists) {
            log.info("publishing sublist of size {}", subList.size());
            long high = ringBuffer.next(subList.size());
            long low = high - (subList.size() - 1);
            long position = low;
            for (Document document: subList) {
                ringBuffer.get(position++).setEvent(document);
            }
            ringBuffer.publish(low, high);
            lastPublishedSequence.set(high);
        }
    }

Each of my sources calls consume, I use Guice to create a Singleton disruptor.我的每个来源都调用了消耗,我使用 Guice 创建了一个单例干扰器。

My eventHandler routine is我的 eventHandler 例程是

    public void onEvent(Event event, long sequence, boolean endOfBatch) throws Exception {
        Document document = event.getValue();
        handler.processDocument(document); //send the document to handler
        if (endOfBatch) {
            handler.processDocumentsList(); // tell handler to process all documents so far.
        }
    }

I am seeing in my logs that the producer ( consume ) is stalling at times.我在我的日志中看到生产者( consume )有时会停止。 I assume that this is when the ringBuffer is full, and the eventHandler is not able to process quickly enough.我假设这是在 ringBuffer 已满时,并且 eventHandler 无法足够快地处理。 I see that the eventHandler is processing documents (from my logs) and then after a while the producer starts publishing more documents to the ring buffer.我看到 eventHandler 正在处理文档(来自我的日志),然后过了一段时间,生产者开始将更多文档发布到环形缓冲区。

Questions:问题:

  • Am I using the correct Disruptor pattern?我是否使用了正确的 Disruptor 模式? I see there are quite a few ways to use it.我看到有很多方法可以使用它。 I chose to use the batchEventProcessor so it would signal endOfBatch .我选择使用 batchEventProcessor 所以它会发出endOfBatch信号。
  • How can I increase the efficiency of my EventHandler?如何提高 EventHandler 的效率? processDocumentsList can be slow. processDocumentsList 可能很慢。
  • Should I use parallel EventHandlers?我应该使用并行 EventHandlers 吗? The lmax user-guide mentions that this is possible, and the FAQ has a question on it. lmax 用户指南提到这是可能的, FAQ 中有一个问题。 But how do I use this with the batchEventProcessor?但是如何将它与 batchEventProcessor 一起使用? It only takes one eventHandler.它只需要一个 eventHandler。

Is your handler stateful?你的handler有状态的吗? If not, you can use multiple parallel event handlers to process the documents.如果没有,您可以使用多个并行事件处理程序来处理文档。 You could implement a basic sharding strategy where only one of the handlers processes each event.您可以实现一种基本的分片策略,其中只有一个处理程序处理每个事件。

endOfBatch is usually used to speed up the speed of processing by optimising IO operations that benefit from batching. endOfBatch通常用于通过优化受益于批处理的 IO 操作来加快处理速度。 Eg writing to file on each event but only flushing on endOfBatch .例如,在每个事件上写入文件,但仅在endOfBatchendOfBatch

It's hard to give any more advice without know what happens in your document processor.如果不知道文档处理器中发生了什么,就很难提供更多建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM