Akka Http Performance tuning

Question

I am performing Load testing on Akka-http framework(version: 10.0), I am using wrk tool. wrk command:

wrk -t6 -c10000 -d 60s --timeout 10s --latency http://localhost:8080/hello

first run without any blocking call,

object WebServer {

  implicit val system = ActorSystem("my-system")
  implicit val materializer = ActorMaterializer()
  implicit val executionContext = system.dispatcher
  def main(args: Array[String]) {


    val bindingFuture = Http().bindAndHandle(router.route, "localhost", 8080)

    println(
      s"Server online at http://localhost:8080/\nPress RETURN to stop...")
    StdIn.readLine() // let it run until user presses return
    bindingFuture
      .flatMap(_.unbind()) // trigger unbinding from the port
      .onComplete(_ => system.terminate()) // and shutdown when done
  }
}

object router {
  implicit val executionContext = WebServer.executionContext


  val route =
    path("hello") {
      get {
        complete {
        "Ok"
        }
      }
    }
}

output of wrk:

    Running 1m test @ http://localhost:8080/hello
  6 threads and 10000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.22ms   16.41ms   2.08s    98.30%
    Req/Sec     9.86k     6.31k   25.79k    62.56%
  Latency Distribution
     50%    3.14ms
     75%    3.50ms
     90%    4.19ms
     99%   31.08ms
  3477084 requests in 1.00m, 477.50MB read
  Socket errors: connect 9751, read 344, write 0, timeout 0
Requests/sec:  57860.04
Transfer/sec:      7.95MB

Now if i add a future call in the route and run the test again.

val route =
    path("hello") {
      get {
        complete {
          Future { // Blocking code
            Thread.sleep(100)
            "OK"
          }
        }
      }
    }

Output, of wrk:

Running 1m test @ http://localhost:8080/hello
  6 threads and 10000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   527.07ms  491.20ms  10.00s    88.19%
    Req/Sec    49.75     39.55   257.00     69.77%
  Latency Distribution
     50%  379.28ms
     75%  632.98ms
     90%    1.08s 
     99%    2.07s 
  13744 requests in 1.00m, 1.89MB read
  Socket errors: connect 9751, read 385, write 38, timeout 98
Requests/sec:    228.88
Transfer/sec:     32.19KB

As you can see with future call only 13744 requests are being served .

After following Akka documentation , I added a separate dispatcher thread pool for the route which creates max, of 200 threads .

implicit val executionContext = WebServer.system.dispatchers.lookup("my-blocking-dispatcher")
// config of dispatcher
my-blocking-dispatcher {
  type = Dispatcher
  executor = "thread-pool-executor"
  thread-pool-executor {
    // or in Akka 2.4.2+
    fixed-pool-size = 200
  }
  throughput = 1
}

After the above change, the performance improved a bit

Running 1m test @ http://localhost:8080/hello
  6 threads and 10000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   127.03ms   21.10ms 504.28ms   84.30%
    Req/Sec   320.89    175.58   646.00     60.01%
  Latency Distribution
     50%  122.85ms
     75%  135.16ms
     90%  147.21ms
     99%  190.03ms
  114378 requests in 1.00m, 15.71MB read
  Socket errors: connect 9751, read 284, write 0, timeout 0
Requests/sec:   1903.01
Transfer/sec:    267.61KB

In the my-blocking-dispatcher config if I increase the pool size above 200 the performance is same.

Now, what other parameters or config should I use to increase the performance while using future call.So that app gives the maximum throughput.

Answer 1

Some disclaimers first: I haven't worked with wrk tool before, so I might get something wrong. Here are assumptions I've made for this answer:

Connections count is independent from threads count, ie if I specify -t4 -c10000 it keeps 10000 connections, not 4 * 10000.
For every connection the behavior is as follows: it sends the request, receives the response completely, and immediately sends the next one, etc., until the time runs out.

Also I've run the server on the same machine as wrk, and my machine seems to be weaker than yours (I have only dual-core CPU), so I've reduced wrk's thread counts to 2, and connection count to 1000, to get decent results.

The Akka Http version I've used is the 10.0.1 , and wrk version is 4.0.2 .

Now to the answer. Let's look at the blocking code you have:

Future { // Blocking code
  Thread.sleep(100)
  "OK"
}

This means, every request will take at least 100 milliseconds. If you have 200 threads, and 1000 connections, the timeline will be as follows:

Msg: 0       200      400      600      800     1000     1200      2000
     |--------|--------|--------|--------|--------|--------|---..---|---...
Ms:  0       100      200      300      400      500      600      1000

Where Msg is amount of processed messages, Ms is elapsed time in milliseconds.

This gives us 2000 messages processed per second, or ~60000 messages per 30 seconds, which mostly agrees to the test figures:

wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test @ http://localhost:8080/hello
  2 threads and 1000 connections
  Thread Stats   Avg     Stdev     Max   +/- Stdev
    Latency   412.30ms   126.87ms 631.78ms   82.89%
    Req/Sec     0.95k    204.41     1.40k    75.73%
  Latency Distribution
     50%  455.18ms
     75%  512.93ms
     90%  517.72ms
     99%  528.19ms
here: --> 56104 requests in 30.09s <--, 7.70MB read
  Socket errors: connect 0, read 1349, write 14, timeout 0
Requests/sec:   1864.76
Transfer/sec:    262.23KB

It is also obvious that this number (2000 messages per second) is strictly bound by the threads count. Eg if we would have 300 threads, we'd process 300 messages every 100 ms, so we'd have 3000 messages per second, if our system can handle so many threads. Let's see how we'll fare if we provide 1 thread per connection, ie 1000 threads in pool:

wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test @ http://localhost:8080/hello
  2 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   107.08ms   16.86ms 582.44ms   97.24%
    Req/Sec     3.80k     1.22k    5.05k    79.28%
  Latency Distribution
     50%  104.77ms
     75%  106.74ms
     90%  110.01ms
     99%  155.24ms
  223751 requests in 30.08s, 30.73MB read
  Socket errors: connect 0, read 1149, write 1, timeout 0
Requests/sec:   7439.64
Transfer/sec:      1.02MB

As you can see, now one request takes almost exactly 100ms on average, ie the same amount we put into Thread.sleep . It seems we can't get much faster than this! Now we're pretty much in standard situation of one thread per request , which worked pretty well for many years until the asynchronous IO let servers scale up much higher.

For the sake of comparison, here's the fully non-blocking test results on my machine with default fork-join thread pool:

complete {
  Future {
    "OK"
  }
}

====>

wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test @ http://localhost:8080/hello
  2 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    15.50ms   14.35ms 468.11ms   93.43%
    Req/Sec    22.00k     5.99k   34.67k    72.95%
  Latency Distribution
     50%   13.16ms
     75%   18.77ms
     90%   25.72ms
     99%   66.65ms
  1289402 requests in 30.02s, 177.07MB read
  Socket errors: connect 0, read 1103, write 42, timeout 0
Requests/sec:  42946.15
Transfer/sec:      5.90MB

To summarize, if you use blocking operations, you need one thread per request to achieve the best throughput, so configure your thread pool accordingly. There are natural limits for how many threads your system can handle, and you might need to tune your OS for maximum threads count. For best throughput, avoid blocking operations.

Also don't confuse asynchronous operations with non-blocking ones. Your code with Future and Thread.sleep is a perfect example of asynchronous, but blocking operation. Lots of popular software operates in this mode (some legacy HTTP clients, Cassandra drivers, AWS Java SDKs, etc.). To fully reap the benefits of non-blocking HTTP server, you need to be non-blocking all the way down, not just asynchronous. It might not be always possible, but it's something to strive for.

Answer 2

I get x3 performance on my localhost with this config:

akka {
  actor {
    default-dispatcher {
      fork-join-executor {
        parallelism-min = 1
        parallelism-max = 64
        parallelism-factor = 1
      }
      throughput = 64
    }
  }

  http {
    host-connection-pool {
      max-connections = 10000
      max-open-requests = 4096
    }

    server {
      pipelining-limit = 1024
      max-connections = 4096
      backlog = 1024
    }
  }
}

Maybe other values for these params will make even better (write to me pls if yes).

Akka Http version 10.1.12.

Akka Http Performance tuning

Question

2 answers

solution1
29 ACCPTED 2016-12-23 16:13:34

solution2
0 2020-06-25 21:05:56

Akka Http Performance tuning

Question

2 answers

solution1 29 ACCPTED 2016-12-23 16:13:34

solution2 0 2020-06-25 21:05:56

solution1
29 ACCPTED 2016-12-23 16:13:34

solution2
0 2020-06-25 21:05:56