Rust vs Go concurrent webserver, why is Rust slow here?

Question

I was trying out some benchmarking of the multi-threaded webserver example in the Rust book and for comparison I built something similar in Go and ran a benchmark using ApacheBench. Though its a simple example the difference was way too much. Go web server doing the same was 10 times faster. Since I was expecting Rust to be faster or at same level, I tried multiple revisions using futures and smol (Though my goal was to compare implementations using only standard library) but result was almost the same. Can anyone here suggest changes to the Rust implementation to make it faster without using a huge thread count?

Here is the code I used: https://github.com/deepu105/concurrency-benchmarks

The tokio-http version is the slowest, the other 3 rust versions give almost same result

Here are the benchmarks:

Rust (with 8 threads, with 100 threads the numbers are closer to Go):

❯ ab -c 100 -n 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        176 bytes

Concurrency Level:      100
Time taken for tests:   26.027 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      195000 bytes
HTML transferred:       176000 bytes
Requests per second:    38.42 [#/sec] (mean)
Time per request:       2602.703 [ms] (mean)
Time per request:       26.027 [ms] (mean, across all concurrent requests)
Transfer rate:          7.32 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   2.9      1      16
Processing:     4 2304 1082.5   2001    5996
Waiting:        0 2303 1082.7   2001    5996
Total:          4 2307 1082.1   2002    5997

Percentage of the requests served within a certain time (ms)
  50%   2002
  66%   2008
  75%   2018
  80%   3984
  90%   3997
  95%   4002
  98%   4005
  99%   5983
 100%   5997 (longest request)

Go:

ab -c 100 -n 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        174 bytes

Concurrency Level:      100
Time taken for tests:   2.102 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      291000 bytes
HTML transferred:       174000 bytes
Requests per second:    475.84 [#/sec] (mean)
Time per request:       210.156 [ms] (mean)
Time per request:       2.102 [ms] (mean, across all concurrent requests)
Transfer rate:          135.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   1.4      2       5
Processing:     0  203 599.8      3    2008
Waiting:        0  202 600.0      2    2008
Total:          0  205 599.8      5    2013

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      7
  75%      8
  80%      8
  90%   2000
  95%   2003
  98%   2005
  99%   2010
 100%   2013 (longest request)

Answer 1

I only compared your "rustws" and the Go version. In Go you have unlimited goroutines (even though you limit them all to only one CPU core) while in rustws you create a thread pool with 8 threads.

Since your request handlers sleep 2 seconds for every 10th request you are limiting the rustws version to 80/2 = 40 requests per second which is what you are seeing in the ab results. Go does not suffer from this arbitrary bottleneck so it shows you the maximum it candle handle on a single CPU core.

Answer 2

I was finally able to get similar results in Rust using the async_std lib

❯ ab -c 100 -n 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        176 bytes

Concurrency Level:      100
Time taken for tests:   2.094 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      195000 bytes
HTML transferred:       176000 bytes
Requests per second:    477.47 [#/sec] (mean)
Time per request:       209.439 [ms] (mean)
Time per request:       2.094 [ms] (mean, across all concurrent requests)
Transfer rate:          90.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   1.7      2       7
Processing:     0  202 599.7      2    2002
Waiting:        0  201 600.1      1    2002
Total:          0  205 599.7      5    2007

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      6
  75%      9
  80%      9
  90%   2000
  95%   2003
  98%   2004
  99%   2006
 100%   2007 (longest request)

Here is the implementation

use async_std::net::TcpListener;
use async_std::net::TcpStream;
use async_std::prelude::*;
use async_std::task;
use std::fs;
use std::time::Duration;

#[async_std::main]
async fn main() {
    let mut count = 0;

    let listener = TcpListener::bind("127.0.0.1:8080").await.unwrap(); // set listen port

    loop {
        count = count + 1;
        let count_n = Box::new(count);
        let (stream, _) = listener.accept().await.unwrap();
        task::spawn(handle_connection(stream, count_n)); // spawn a new task to handle the connection
    }
}

async fn handle_connection(mut stream: TcpStream, count: Box<i64>) {
    // Read the first 1024 bytes of data from the stream
    let mut buffer = [0; 1024];
    stream.read(&mut buffer).await.unwrap();

    // add 2 second delay to every 10th request
    if (*count % 10) == 0 {
        println!("Adding delay. Count: {}", count);
        task::sleep(Duration::from_secs(2)).await;
    }

    let contents = fs::read_to_string("hello.html").unwrap(); // read html file

    let response = format!("{}{}", "HTTP/1.1 200 OK\r\n\r\n", contents);
    stream.write(response.as_bytes()).await.unwrap(); // write response
    stream.flush().await.unwrap();
}

Rust vs Go concurrent webserver, why is Rust slow here?

Question

2 answers

solution1
2 2020-11-26 19:41:56

solution2
2 ACCPTED 2020-11-26 22:02:28

Rust vs Go concurrent webserver, why is Rust slow here?

Question

2 answers

solution1 2 2020-11-26 19:41:56

solution2 2 ACCPTED 2020-11-26 22:02:28

solution1
2 2020-11-26 19:41:56

solution2
2 ACCPTED 2020-11-26 22:02:28