[英]Why is disruptor slower with smaller ring buffer?
遵循Disruptor入門指南 ,我建立了一個由單個生產者和單個消費者組成的最小破壞者。
制片人
import com.lmax.disruptor.RingBuffer;
public class LongEventProducer
{
private final RingBuffer<LongEvent> ringBuffer;
public LongEventProducer(RingBuffer<LongEvent> ringBuffer)
{
this.ringBuffer = ringBuffer;
}
public void onData()
{
long sequence = ringBuffer.next();
try
{
LongEvent event = ringBuffer.get(sequence);
}
finally
{
ringBuffer.publish(sequence);
}
}
}
消費者 (請注意,消費者在onEvent
不執行任何操作)
import com.lmax.disruptor.EventHandler;
public class LongEventHandler implements EventHandler<LongEvent>
{
public void onEvent(LongEvent event, long sequence, boolean endOfBatch)
{}
}
我的目標是對大型環緩沖區進行一次性能測試,而不是多次遍歷較小的環。 在每種情況下,總操作數( bufferSize
X rotations
)是相同的。 我發現隨着環形緩沖區變小,操作/秒速率急劇下降。
RingBuffer Size | Revolutions | Total Ops | Mops/sec
1048576 | 1 | 1048576 | 50-60
1024 | 1024 | 1048576 | 8-16
64 | 16384 | 1048576 | 0.5-0.7
8 | 131072 | 1048576 | 0.12-0.14
問題: 減小環形緩沖區的大小但固定總迭代次數后,性能大幅下降的原因是什么? 這種趨勢獨立於WaitStrategy
和Single vs MultiProducer
吞吐量降低了,但是趨勢是相同的。
Main (注意SingleProducer
和BusySpinWaitStrategy
)
import com.lmax.disruptor.BusySpinWaitStrategy;
import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import com.lmax.disruptor.dsl.ProducerType;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
public class LongEventMainJava{
static double ONEMILLION = 1000000.0;
static double ONEBILLION = 1000000000.0;
public static void main(String[] args) throws Exception {
// Executor that will be used to construct new threads for consumers
Executor executor = Executors.newCachedThreadPool();
// TUNABLE PARAMS
int ringBufferSize = 1048576; // 1024, 64, 8
int rotations = 1; // 1024, 16384, 131702
// Construct the Disruptor
Disruptor disruptor = new Disruptor<>(new LongEventFactory(), ringBufferSize, executor, ProducerType.SINGLE, new BusySpinWaitStrategy());
// Connect the handler
disruptor.handleEventsWith(new LongEventHandler());
// Start the Disruptor, starts all threads running
disruptor.start();
// Get the ring buffer from the Disruptor to be used for publishing.
RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();
LongEventProducer producer = new LongEventProducer(ringBuffer);
long start = System.nanoTime();
long totalIterations = rotations * ringBufferSize;
for (long i = 0; i < totalIterations; i++) {
producer.onData();
}
double duration = (System.nanoTime()-start)/ONEBILLION;
System.out.println(String.format("Buffersize: %s, rotations: %s, total iterations = %s, duration: %.2f seconds, rate: %.2f Mops/s",
ringBufferSize, rotations, totalIterations, duration, totalIterations/(ONEMILLION * duration)));
}
}
要運行,您需要簡單的Factory代碼
import com.lmax.disruptor.EventFactory;
public class LongEventFactory implements EventFactory<LongEvent>
{
public LongEvent newInstance()
{
return new LongEvent();
}
}
在Core i5-2400、12GB ram,Windows 7上運行
樣本輸出
Buffersize: 1048576, rotations: 1, total iterations = 1048576, duration: 0.02 seconds, rate: 59.03 Mops/s
Buffersize: 64, rotations: 16384, total iterations = 1048576, duration: 2.01 seconds, rate: 0.52 Mops/s
當生產者填滿環形緩沖區時,它必須等到事件被消耗后才能繼續。
當緩沖區恰好要放入的元素數大小時,生成器就不必等待。 它永遠不會溢出。 它所做的基本上就是增加一個計數,增加索引並在該索引處的環形緩沖區中發布數據。
當緩沖區較小時,它仍然只是增加計數並發布,但是這樣做的速度快於使用者可以消耗的速度。 因此,生產者必須等到元素被消耗並且環形緩沖區上的空間被釋放為止。
似乎問題出在lmax\\disruptor\\SingleProducerSequencer
中的這段代碼中
if (wrapPoint > cachedGatingSequence || cachedGatingSequence > nextValue)
{
cursor.setVolatile(nextValue); // StoreLoad fence
long minSequence;
while (wrapPoint > (minSequence = Util.getMinimumSequence(gatingSequences, nextValue)))
{
waitStrategy.signalAllWhenBlocking();
LockSupport.parkNanos(1L); // TODO: Use waitStrategy to spin?
}
this.cachedValue = minSequence;
}
特別是對LockSupport.parkNanos(1L)
的調用。 在Windows上最多可能需要15 毫秒 。 當生產者到達緩沖區末尾並在等待消費者時,將調用此方法。
其次,當緩沖區較小時,可能會發生RingBuffer的錯誤共享。 我想這兩種作用都在起作用。
最終,在基准測試之前,我能夠使用JIT通過onData()
一百萬次調用來加速代碼。 最好的情況是> 80Mops/sec
,但並不能消除緩沖區縮小帶來的性能下降。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.