Suppose we execute Thread.sleep(1)
within a loop iterating n
times (here and below it's Java 11):
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep1Benchmark {
@Param({"5", "10", "50"})
long delay;
@Benchmark
public int sleep() throws Exception {
for (int i = 0; i < delay; i++) {
Thread.sleep(1);
}
return hashCode();
}
}
This benchmark demonstrates the following results:
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 6,552 ± 0,071 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 13,343 ± 0,227 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 68,059 ± 1,441 ms/op
Here we see that method sleep()
takes more than n
milliseconds while intuitively we would expect it to be ~n
as at each iteration current thread sleeps for 1 ms. This example demonstrates the costs of putting thread asleep and awakening it.
Let's now modify the benchmark:
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep2Benchmark {
private final ExecutorService executor = Executors.newFixedThreadPool(1);
volatile boolean flag;
@Param({"5", "10", "50"})
long delay;
@Setup(Level.Invocation)
public void setUp() {
flag = true;
startThread();
}
@TearDown(Level.Trial)
public void tearDown() {
executor.shutdown();
}
@Benchmark
public int sleep() throws Exception {
while (flag) {
Thread.sleep(1);
}
return hashCode();
}
private void startThread() {
executor.submit(() -> {
try {
Thread.sleep(delay);
flag = false;
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RuntimeException(e);
}
});
}
}
Here we run a background thread that waits for n
milliseconds and puts the flag down while the sleep()
method iterates over while(flag)
loop. As soon as the flag is put down after delay of n
milliseconds we expect while
loop iterate approximately n
times.
And again we see costs of Thread.sleep(1)
but they appear to be almost same for delay
of 5 and 10 significantly lower for the case when delay
is 50. Pay attention, that the difference here is not linear: it is ~0,1 ms for 5, ~1,2 ms for 10 and ~13 ms for 50.
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 6,760 ± 0,070 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 12,496 ± 0,050 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 54,727 ± 0,599 ms/op
On Java 18 results are similar:
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 6,609 ± 0,105 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 13,233 ± 0,148 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 66,017 ± 0,714 ms/op
ThreadSleep2Benchmark.sleep 5 avgt 50 6,740 ± 0,067 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 12,400 ± 0,112 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 53,836 ± 0,250 ms/op
So my question is: whether the effect of costs reduction in ThreadSleep2Benchmark
is compiler's achievement (loop unrolling etc.) or is it about how I iterate over the loops?
UPD
For Linux I've got the following results:
Java 11
Linux
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 5.597 ± 0.038 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 11.263 ± 0.069 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 56.079 ± 0.267 ms/op
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 5.600 ± 0.032 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 10.558 ± 0.052 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 50.625 ± 0.049 ms/op
Java 18
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 5.581 ± 0.041 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 11.069 ± 0.067 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 55.719 ± 0.602 ms/op
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 5.574 ± 0.035 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 10.918 ± 0.035 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 50.823 ± 0.055 ms/op
If you want more control on pausing a Java thread, have a look at LockSupport.parkNanos. Under Linux by default, you can get 50 us resolution. For more info and how to tune it, see https://hazelcast.com/blog/locksupport-parknanos-under-the-hood-and-the-curious-case-of-parking/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.