[英]Akka Http Performance tuning
I am performing Load testing on Akka-http framework(version: 10.0), I am using wrk tool.我正在 Akka-http 框架(版本:10.0)上执行负载测试,我正在使用wrk工具。 wrk command:
wrk 命令:
wrk -t6 -c10000 -d 60s --timeout 10s --latency http://localhost:8080/hello
first run without any blocking call,第一次运行没有任何阻塞调用,
object WebServer {
implicit val system = ActorSystem("my-system")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
def main(args: Array[String]) {
val bindingFuture = Http().bindAndHandle(router.route, "localhost", 8080)
println(
s"Server online at http://localhost:8080/\nPress RETURN to stop...")
StdIn.readLine() // let it run until user presses return
bindingFuture
.flatMap(_.unbind()) // trigger unbinding from the port
.onComplete(_ => system.terminate()) // and shutdown when done
}
}
object router {
implicit val executionContext = WebServer.executionContext
val route =
path("hello") {
get {
complete {
"Ok"
}
}
}
}
output of wrk: wrk的输出:
Running 1m test @ http://localhost:8080/hello
6 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.22ms 16.41ms 2.08s 98.30%
Req/Sec 9.86k 6.31k 25.79k 62.56%
Latency Distribution
50% 3.14ms
75% 3.50ms
90% 4.19ms
99% 31.08ms
3477084 requests in 1.00m, 477.50MB read
Socket errors: connect 9751, read 344, write 0, timeout 0
Requests/sec: 57860.04
Transfer/sec: 7.95MB
Now if i add a future call in the route and run the test again.现在,如果我在路由中添加一个未来的调用并再次运行测试。
val route =
path("hello") {
get {
complete {
Future { // Blocking code
Thread.sleep(100)
"OK"
}
}
}
}
Output, of wrk: wrk 的输出:
Running 1m test @ http://localhost:8080/hello
6 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 527.07ms 491.20ms 10.00s 88.19%
Req/Sec 49.75 39.55 257.00 69.77%
Latency Distribution
50% 379.28ms
75% 632.98ms
90% 1.08s
99% 2.07s
13744 requests in 1.00m, 1.89MB read
Socket errors: connect 9751, read 385, write 38, timeout 98
Requests/sec: 228.88
Transfer/sec: 32.19KB
As you can see with future call only 13744 requests are being served .正如您在未来调用中看到的那样,只有13744 个请求正在被处理。
After following Akka documentation , I added a separate dispatcher thread pool for the route which creates max, of 200 threads .在遵循Akka 文档之后,我为创建最多200 个线程的路由添加了一个单独的调度程序线程池。
implicit val executionContext = WebServer.system.dispatchers.lookup("my-blocking-dispatcher")
// config of dispatcher
my-blocking-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
// or in Akka 2.4.2+
fixed-pool-size = 200
}
throughput = 1
}
After the above change, the performance improved a bit经过上面的改动,性能有所提升
Running 1m test @ http://localhost:8080/hello
6 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 127.03ms 21.10ms 504.28ms 84.30%
Req/Sec 320.89 175.58 646.00 60.01%
Latency Distribution
50% 122.85ms
75% 135.16ms
90% 147.21ms
99% 190.03ms
114378 requests in 1.00m, 15.71MB read
Socket errors: connect 9751, read 284, write 0, timeout 0
Requests/sec: 1903.01
Transfer/sec: 267.61KB
In the my-blocking-dispatcher config if I increase the pool size above 200 the performance is same.在my-blocking-dispatcher 配置中,如果我将池大小增加到 200 以上,性能是相同的。
Now, what other parameters or config should I use to increase the performance while using future call.So that app gives the maximum throughput.现在,我应该使用哪些其他参数或配置来提高性能,同时使用未来的调用。因此该应用程序可提供最大吞吐量。
Some disclaimers first: I haven't worked with wrk
tool before, so I might get something wrong.首先免责声明:我以前没有使用过
wrk
工具,所以我可能会出错。 Here are assumptions I've made for this answer:以下是我为这个答案所做的假设:
-t4 -c10000
it keeps 10000 connections, not 4 * 10000.-t4 -c10000
它会保持 10000 个连接,而不是 4 * 10000。 Also I've run the server on the same machine as wrk, and my machine seems to be weaker than yours (I have only dual-core CPU), so I've reduced wrk's thread counts to 2, and connection count to 1000, to get decent results.另外我和 wrk 在同一台机器上运行服务器,我的机器似乎比你的弱(我只有双核 CPU),所以我将 wrk 的线程数减少到 2,连接数减少到 1000,以获得体面的结果。
The Akka Http version I've used is the 10.0.1
, and wrk version is 4.0.2
.我使用的 Akka Http 版本是
10.0.1
,而 wrk 版本是4.0.2
。
Now to the answer.现在来回答。 Let's look at the blocking code you have:
让我们看看您拥有的阻塞代码:
Future { // Blocking code
Thread.sleep(100)
"OK"
}
This means, every request will take at least 100 milliseconds.这意味着,每个请求至少需要 100 毫秒。 If you have 200 threads, and 1000 connections, the timeline will be as follows:
如果您有 200 个线程和 1000 个连接,时间线将如下所示:
Msg: 0 200 400 600 800 1000 1200 2000
|--------|--------|--------|--------|--------|--------|---..---|---...
Ms: 0 100 200 300 400 500 600 1000
Where Msg
is amount of processed messages, Ms
is elapsed time in milliseconds.其中
Msg
是已处理消息的数量, Ms
是以毫秒为单位的经过时间。
This gives us 2000 messages processed per second, or ~60000 messages per 30 seconds, which mostly agrees to the test figures:这使我们每秒处理 2000 条消息,或每 30 秒处理约 60000 条消息,这与测试数据基本一致:
wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test @ http://localhost:8080/hello
2 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 412.30ms 126.87ms 631.78ms 82.89%
Req/Sec 0.95k 204.41 1.40k 75.73%
Latency Distribution
50% 455.18ms
75% 512.93ms
90% 517.72ms
99% 528.19ms
here: --> 56104 requests in 30.09s <--, 7.70MB read
Socket errors: connect 0, read 1349, write 14, timeout 0
Requests/sec: 1864.76
Transfer/sec: 262.23KB
It is also obvious that this number (2000 messages per second) is strictly bound by the threads count.很明显,这个数字(每秒 2000 条消息)受到线程数的严格限制。 Eg if we would have 300 threads, we'd process 300 messages every 100 ms, so we'd have 3000 messages per second, if our system can handle so many threads.
例如,如果我们有 300 个线程,我们将每 100 毫秒处理 300 条消息,那么如果我们的系统可以处理这么多线程,我们每秒将有 3000 条消息。 Let's see how we'll fare if we provide 1 thread per connection, ie 1000 threads in pool:
让我们看看如果我们为每个连接提供 1 个线程,即池中的 1000 个线程,我们会怎样:
wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test @ http://localhost:8080/hello
2 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 107.08ms 16.86ms 582.44ms 97.24%
Req/Sec 3.80k 1.22k 5.05k 79.28%
Latency Distribution
50% 104.77ms
75% 106.74ms
90% 110.01ms
99% 155.24ms
223751 requests in 30.08s, 30.73MB read
Socket errors: connect 0, read 1149, write 1, timeout 0
Requests/sec: 7439.64
Transfer/sec: 1.02MB
As you can see, now one request takes almost exactly 100ms on average, ie the same amount we put into Thread.sleep
.如您所见,现在一个请求平均需要几乎 100 毫秒,即与我们放入
Thread.sleep
的相同数量。 It seems we can't get much faster than this!看来我们不能比这更快了! Now we're pretty much in standard situation of
one thread per request
, which worked pretty well for many years until the asynchronous IO let servers scale up much higher.现在我们几乎处于
one thread per request
标准情况下,这种情况已经运行了很多年,直到异步 IO 让服务器扩展得更高。
For the sake of comparison, here's the fully non-blocking test results on my machine with default fork-join thread pool:为了比较,这里是我的机器上使用默认 fork-join 线程池的完全非阻塞测试结果:
complete {
Future {
"OK"
}
}
====>
wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test @ http://localhost:8080/hello
2 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 15.50ms 14.35ms 468.11ms 93.43%
Req/Sec 22.00k 5.99k 34.67k 72.95%
Latency Distribution
50% 13.16ms
75% 18.77ms
90% 25.72ms
99% 66.65ms
1289402 requests in 30.02s, 177.07MB read
Socket errors: connect 0, read 1103, write 42, timeout 0
Requests/sec: 42946.15
Transfer/sec: 5.90MB
To summarize, if you use blocking operations, you need one thread per request to achieve the best throughput, so configure your thread pool accordingly.总而言之,如果您使用阻塞操作,则每个请求需要一个线程来实现最佳吞吐量,因此请相应地配置您的线程池。 There are natural limits for how many threads your system can handle, and you might need to tune your OS for maximum threads count.
您的系统可以处理的线程数有自然限制,您可能需要调整操作系统以获得最大线程数。 For best throughput, avoid blocking operations.
为获得最佳吞吐量,请避免阻塞操作。
Also don't confuse asynchronous operations with non-blocking ones.也不要将异步操作与非阻塞操作混淆。 Your code with
Future
and Thread.sleep
is a perfect example of asynchronous, but blocking operation.您的
Future
和Thread.sleep
代码是异步但阻塞操作的完美示例。 Lots of popular software operates in this mode (some legacy HTTP clients, Cassandra drivers, AWS Java SDKs, etc.).许多流行的软件都在这种模式下运行(一些传统的 HTTP 客户端、Cassandra 驱动程序、AWS Java SDK 等)。 To fully reap the benefits of non-blocking HTTP server, you need to be non-blocking all the way down, not just asynchronous.
要充分利用非阻塞 HTTP 服务器的好处,您需要一直保持非阻塞,而不仅仅是异步。 It might not be always possible, but it's something to strive for.
这可能并不总是可能的,但它是值得努力的。
I get x3 performance on my localhost with this config:我使用此配置在我的本地主机上获得 x3 性能:
akka {
actor {
default-dispatcher {
fork-join-executor {
parallelism-min = 1
parallelism-max = 64
parallelism-factor = 1
}
throughput = 64
}
}
http {
host-connection-pool {
max-connections = 10000
max-open-requests = 4096
}
server {
pipelining-limit = 1024
max-connections = 4096
backlog = 1024
}
}
}
Maybe other values for these params will make even better (write to me pls if yes).也许这些参数的其他值会更好(如果是,请写信给我)。
Akka Http version 10.1.12. Akka Http 版本 10.1.12。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.