简体   繁体   中英

Why is swapping elements of a []float64 in Go faster than swapping elements of a Vec<f64> in Rust?

I have two (equivalent?) programs, one in Go the other in Rust. The average execution time is:

  • Go ~169ms
  • Rust ~201ms

Go

package main

import (
    "fmt"
    "time"
)

func main() {
    work := []float64{0.00, 1.00}
    start := time.Now()

    for i := 0; i < 100000000; i++ {
        work[0], work[1] = work[1], work[0]
    }

    elapsed := time.Since(start)
    fmt.Println("Execution time: ", elapsed)
}

Rust

I compiled with --release

use std::time::Instant;

fn main() {
    let mut work: Vec<f64> = Vec::new();
    work.push(0.00);
    work.push(1.00);

    let now = Instant::now();

    for _x in 1..100000000 {
        work.swap(0, 1); 
    }

    let elapsed = now.elapsed();
    println!("Execution time: {:?}", elapsed);
}

Is Rust less performant than Go in this instance? Could the Rust program be written in an idiomatic way, to execute faster?

Could the Rust program be written in an idiomatic way, to execute faster?

Yes. To create a vector with a few elements, use the vec![] macro:

let mut work: Vec<f64> = vec![0.0, 1.0];    

for _x in 1..100000000 {
    work.swap(0, 1); 
}

So is this code faster? Yes. Have a look at what assembly is generated :

example::main:
  mov eax, 99999999
.LBB0_1:
  add eax, -11
  jne .LBB0_1
  ret

On my PC, this runs about 30 times faster than your original code.

Why does the assembly still contain this loop that is doing nothing? Why isn't the compiler able to see that two push es are the same as vec![0.0, 1.0] ? Both very good questions and both probably point to a flaw in LLVM or the Rust compiler.

However, sadly, there isn't much useful information to gain from your micro benchmark. Benchmarking is hard, like really hard. There are so many pitfalls that even professionals fall for. In your case, the benchmark is flawed in several ways. For a start, you never observe the contents of the vector later (it is never used). That's why a good compiler can remove all code that even touches the vector (as the Rust compiler did above). So that's not good.

Apart from that, this does not resemble any real performance critical code. Even if the vector would be observed later, swapping an odd number of times equals a single swap. So unless you wanted to see if the optimizer could understand this swapping rule, sadly your benchmark isn't really useful.

(Not an answer) but to augment what Lukas wrote, here's what Go 1.11 generates for the loop itself:

    xorl    CX, CX
    movsd   8(AX), X0
    movsd   (AX), X1
    movsd   X0, (AX)
    movsd   X1, 8(AX)
    incq    CX
    cmpq    CX, $100000000
    jlt     68

(Courtesy of https://godbolt.org )

In either case, note that most probably the time you measured was dominated by the startup and initialization of the processes, so you did not actually measured the speed of the execution of the loops. IOW your approach is not correct.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM