為什么在使用較小的環形緩沖區時干擾器速度較慢？

Question

遵循Disruptor入門指南，我建立了一個由單個生產者和單個消費者組成的最小破壞者。

制片人

import com.lmax.disruptor.RingBuffer;

public class LongEventProducer
{
    private final RingBuffer<LongEvent> ringBuffer;

    public LongEventProducer(RingBuffer<LongEvent> ringBuffer)
    {
        this.ringBuffer = ringBuffer;
    }

    public void onData()
    {
        long sequence = ringBuffer.next();
        try
        {
            LongEvent event = ringBuffer.get(sequence);
        }
        finally
        {
            ringBuffer.publish(sequence);
        }
    }
}

消費者 （請注意，消費者在onEvent不執行任何操作）

import com.lmax.disruptor.EventHandler;

public class LongEventHandler implements EventHandler<LongEvent>
{
    public void onEvent(LongEvent event, long sequence, boolean endOfBatch)
    {}
}

我的目標是對大型環緩沖區進行一次性能測試，而不是多次遍歷較小的環。 在每種情況下，總操作數（ bufferSize X rotations ）是相同的。 我發現隨着環形緩沖區變小，操作/秒速率急劇下降。

RingBuffer Size |  Revolutions  | Total Ops   |   Mops/sec

    1048576     |      1        |  1048576    |     50-60

       1024     |      1024     |  1048576    |     8-16

        64      |      16384    |  1048576    |    0.5-0.7

        8       |      131072   |  1048576    |    0.12-0.14

問題： 減小環形緩沖區的大小但固定總迭代次數后，性能大幅下降的原因是什么？ 這種趨勢獨立於WaitStrategy和Single vs MultiProducer吞吐量降低了，但是趨勢是相同的。

Main （注意SingleProducer和BusySpinWaitStrategy ）

import com.lmax.disruptor.BusySpinWaitStrategy;
import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import com.lmax.disruptor.dsl.ProducerType;

import java.util.concurrent.Executor;
import java.util.concurrent.Executors;

public class LongEventMainJava{
        static double ONEMILLION = 1000000.0;
        static double ONEBILLION = 1000000000.0;

    public static void main(String[] args) throws Exception {
            // Executor that will be used to construct new threads for consumers
            Executor executor = Executors.newCachedThreadPool();    

            // TUNABLE PARAMS
            int ringBufferSize = 1048576; // 1024, 64, 8
            int rotations = 1; // 1024, 16384, 131702

            // Construct the Disruptor
            Disruptor disruptor = new Disruptor<>(new LongEventFactory(), ringBufferSize, executor, ProducerType.SINGLE, new BusySpinWaitStrategy());

            // Connect the handler
            disruptor.handleEventsWith(new LongEventHandler());

            // Start the Disruptor, starts all threads running
            disruptor.start();

            // Get the ring buffer from the Disruptor to be used for publishing.
            RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();
            LongEventProducer producer = new LongEventProducer(ringBuffer);

            long start = System.nanoTime();
            long totalIterations = rotations * ringBufferSize;
            for (long i = 0; i < totalIterations; i++) {
                producer.onData();
            }
            double duration = (System.nanoTime()-start)/ONEBILLION;
            System.out.println(String.format("Buffersize: %s, rotations: %s, total iterations = %s, duration: %.2f seconds, rate: %.2f Mops/s",
                    ringBufferSize, rotations, totalIterations, duration, totalIterations/(ONEMILLION * duration)));
        }
}

要運行，您需要簡單的Factory代碼

import com.lmax.disruptor.EventFactory;

public class LongEventFactory implements EventFactory<LongEvent>
{
    public LongEvent newInstance()
    {
        return new LongEvent();
    }
}

在Core i5-2400、12GB ram，Windows 7上運行

樣本輸出

Buffersize: 1048576, rotations: 1, total iterations = 1048576, duration: 0.02 seconds, rate: 59.03 Mops/s

Buffersize: 64, rotations: 16384, total iterations = 1048576, duration: 2.01 seconds, rate: 0.52 Mops/s

Answer 1

當生產者填滿環形緩沖區時，它必須等到事件被消耗后才能繼續。

當緩沖區恰好要放入的元素數大小時，生成器就不必等待。 它永遠不會溢出。 它所做的基本上就是增加一個計數，增加索引並在該索引處的環形緩沖區中發布數據。

當緩沖區較小時，它仍然只是增加計數並發布，但是這樣做的速度快於使用者可以消耗的速度。 因此，生產者必須等到元素被消耗並且環形緩沖區上的空間被釋放為止。

Answer 2

似乎問題出在lmax\\disruptor\\SingleProducerSequencer中的這段代碼中

if (wrapPoint > cachedGatingSequence || cachedGatingSequence > nextValue)
        {
            cursor.setVolatile(nextValue);  // StoreLoad fence

            long minSequence;
            while (wrapPoint > (minSequence = Util.getMinimumSequence(gatingSequences, nextValue)))
            {
                waitStrategy.signalAllWhenBlocking();
                LockSupport.parkNanos(1L); // TODO: Use waitStrategy to spin?
            }

            this.cachedValue = minSequence;
        }

特別是對LockSupport.parkNanos(1L)的調用。 在Windows上最多可能需要15 毫秒。 當生產者到達緩沖區末尾並在等待消費者時，將調用此方法。

其次，當緩沖區較小時，可能會發生RingBuffer的錯誤共享。 我想這兩種作用都在起作用。

最終，在基准測試之前，我能夠使用JIT通過onData()一百萬次調用來加速代碼。 最好的情況是> 80Mops/sec ，但並不能消除緩沖區縮小帶來的性能下降。

為什么在使用較小的環形緩沖區時干擾器速度較慢？

問題描述

2 個解決方案

解決方案1
3 已采納 2017-07-03 21:25:00

解決方案2
0 2017-07-05 21:10:19

為什么在使用較小的環形緩沖區時干擾器速度較慢？

問題描述

2 個解決方案

解決方案1 3 已采納 2017-07-03 21:25:00

解決方案2 0 2017-07-05 21:10:19

解決方案1
3 已采納 2017-07-03 21:25:00

解決方案2
0 2017-07-05 21:10:19