Costs of Thread.sleep(1) are different depending on busy-loop implementation

Question

Suppose we execute Thread.sleep(1) within a loop iterating n times (here and below it's Java 11):

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep1Benchmark {
  @Param({"5", "10", "50"})
  long delay;

  @Benchmark
  public int sleep() throws Exception {
    for (int i = 0; i < delay; i++) {
      Thread.sleep(1);
    }
    return hashCode();
  }
}

This benchmark demonstrates the following results:

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   6,552 ± 0,071  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  13,343 ± 0,227  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  68,059 ± 1,441  ms/op

Here we see that method sleep() takes more than n milliseconds while intuitively we would expect it to be ~n as at each iteration current thread sleeps for 1 ms. This example demonstrates the costs of putting thread asleep and awakening it.

Let's now modify the benchmark:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep2Benchmark {
  private final ExecutorService executor = Executors.newFixedThreadPool(1);
  volatile boolean flag;

  @Param({"5", "10", "50"})
  long delay;

  @Setup(Level.Invocation)
  public void setUp() {
    flag = true;
    startThread();
  }

  @TearDown(Level.Trial)
  public void tearDown() {
    executor.shutdown();
  }

  @Benchmark
  public int sleep() throws Exception {
    while (flag) {
      Thread.sleep(1);
    }
    return hashCode();
  }

  private void startThread() {
    executor.submit(() -> {
      try {
        Thread.sleep(delay);
        flag = false;
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new RuntimeException(e);
      }
    });
  }
}

Here we run a background thread that waits for n milliseconds and puts the flag down while the sleep() method iterates over while(flag) loop. As soon as the flag is put down after delay of n milliseconds we expect while loop iterate approximately n times.

And again we see costs of Thread.sleep(1) but they appear to be almost same for delay of 5 and 10 significantly lower for the case when delay is 50. Pay attention, that the difference here is not linear: it is ~0,1 ms for 5, ~1,2 ms for 10 and ~13 ms for 50.

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep2Benchmark.sleep        5  avgt   50   6,760 ± 0,070  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  12,496 ± 0,050  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  54,727 ± 0,599  ms/op

On Java 18 results are similar:

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   6,609 ± 0,105  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  13,233 ± 0,148  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  66,017 ± 0,714  ms/op

ThreadSleep2Benchmark.sleep        5  avgt   50   6,740 ± 0,067  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  12,400 ± 0,112  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  53,836 ± 0,250  ms/op

So my question is: whether the effect of costs reduction in ThreadSleep2Benchmark is compiler's achievement (loop unrolling etc.) or is it about how I iterate over the loops?

UPD

For Linux I've got the following results:

Java 11

Linux

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   5.597 ± 0.038  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  11.263 ± 0.069  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  56.079 ± 0.267  ms/op

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep2Benchmark.sleep        5  avgt   50   5.600 ± 0.032  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  10.558 ± 0.052  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  50.625 ± 0.049  ms/op

Java 18

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   5.581 ± 0.041  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  11.069 ± 0.067  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  55.719 ± 0.602  ms/op

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep2Benchmark.sleep        5  avgt   50   5.574 ± 0.035  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  10.918 ± 0.035  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  50.823 ± 0.055  ms/op

Answer 1

If you want more control on pausing a Java thread, have a look at LockSupport.parkNanos. Under Linux by default, you can get 50 us resolution. For more info and how to tune it, see https://hazelcast.com/blog/locksupport-parknanos-under-the-hood-and-the-curious-case-of-parking/

Costs of Thread.sleep(1) are different depending on busy-loop implementation

Question

1 answers

solution1
2 2022-06-27 13:30:04

Costs of Thread.sleep(1) are different depending on busy-loop implementation

Question

1 answers

solution1 2 2022-06-27 13:30:04

solution1
2 2022-06-27 13:30:04