简体   繁体   English

为什么这个Rust程序这么慢? 我错过了什么?

[英]Why is this Rust program so slow? Did I miss something?

I read Minimal distance in Manhattan metric and rewrote the author's "naive" implementation in Rust . 我读了曼哈顿的最小距离度量,并重写了作者在Rust中的“天真”实现。 The C++ variant is: C ++变体是:

#include <utility>
#include <cstdio>
#include <cstdlib>

std::pair<int, int> pointsA[1000001];
std::pair<int, int> pointsB[1000001];

int main() {
    int n, t;
    unsigned long long dist;

    scanf("%d", &t);

    while(t-->0) {
        dist = 4000000000LL;
        scanf("%d", &n);

        for(int i = 0; i < n; i++) {
            scanf("%d%d", &pointsA[i].first, &pointsA[i].second);
        }

        for(int i = 0; i < n; i++) {
            scanf("%d%d", &pointsB[i].first, &pointsB[i].second);
        }

        for(int i = 0; i < n ;i++) {
            for(int j = 0; j < n ; j++) {
                if(abs(pointsA[i].first - pointsB[j].first) + abs(pointsA[i].second - pointsB[j].second) < dist)
                    dist = abs(pointsA[i].first - pointsB[j].first) + abs(pointsA[i].second - pointsB[j].second);
            }
        }
        printf("%lld\n", dist);
    }
}

The Rust variant is: Rust变种是:

use std::io;
use std::io::BufReader;
use std::io::BufRead;

fn read_array(stdin: &mut BufReader<io::Stdin>, array_len: usize, points: &mut Vec<(i32, i32)>) {
    let mut line = String::new();
    for _ in 0..array_len {
        line.clear();
        stdin.read_line(&mut line).unwrap();
        let mut item = line.split_whitespace();
        let x = item.next().unwrap().parse().unwrap();
        let y = item.next().unwrap().parse().unwrap();
        points.push((x, y));
    }
}

fn manhattan_dist(a: &(i32, i32), b: &(i32, i32)) -> u32 {
    ((a.0 - b.0).abs() + (a.1 - b.1).abs()) as u32
}

fn main() {
    let mut line = String::new();
    let mut stdin = BufReader::new(io::stdin());
    stdin.read_line(&mut line).unwrap();
    let n_iters = line.trim_right().parse::<usize>().unwrap();
    let mut points_a = Vec::with_capacity(10000);
    let mut points_b = Vec::with_capacity(10000);
    for _ in 0..n_iters {
        line.clear();
        stdin.read_line(&mut line).unwrap();
        let set_len = line.trim_right().parse::<usize>().unwrap();
        points_a.clear();
        points_b.clear();
        read_array(&mut stdin, set_len, &mut points_a);
        read_array(&mut stdin, set_len, &mut points_b);
        let mut dist = u32::max_value();
        for i in points_a.iter() {
            for j in points_b.iter() {
                dist = std::cmp::min(manhattan_dist(i, j), dist);
            }
        }
        println!("{}", dist);
    }
}

Then, I generated data with a Python script: 然后,我使用Python脚本生成数据:

import random

ITER = 100
N = 10000
MAX_INT = 1000000

print("%d" % ITER)

for _ in range(0, ITER):
    print("%d" % N)
    for _ in range(0, N):
        print(random.randrange(-MAX_INT, MAX_INT + 1), random.randrange(1, MAX_INT + 1))
    for _ in range(0, N):
        print(random.randrange(-MAX_INT, MAX_INT + 1), random.randrange(-MAX_INT, 0))

And compiled both variants with g++ -Ofast -march=native and rustc -C opt-level=3 respectively. 并使用g++ -Ofast -march=nativerustc -C opt-level=3分别编译了两个变体。 The timings are: 时间是:

C++ C ++

real    0m7.789s
user    0m7.760s
sys     0m0.020s

Rust

real    0m28.589s
user    0m28.570s
sys     0m0.010s

Why is my Rust code four times slower than the C++ variant? 为什么我的Rust代码比C ++变体慢四倍? I am using Rust 1.12.0-beta.1. 我正在使用Rust 1.12.0-beta.1。

I added time measurements: 我添加了时间测量:

let now = SystemTime::now();
line.clear();
stdin.read_line(&mut line).unwrap();
let set_len = line.trim_right().parse::<usize>().unwrap();
points_a.clear();
points_b.clear();
read_array(&mut stdin, set_len, &mut points_a);
read_array(&mut stdin, set_len, &mut points_b);
io_time += now.elapsed().unwrap();

let now = SystemTime::now();
let mut dist = u32::max_value();
for i in points_a.iter() {
    for j in points_b.iter() {
        dist = std::cmp::min(manhattan_dist(i, j), dist);
    }
}
calc_time += now.elapsed().unwrap();

And writeln!(&mut std::io::stderr(), "io_time: {}, calc_time: {}", io_time.as_secs(), calc_time.as_secs()).unwrap(); 并且writeln!(&mut std::io::stderr(), "io_time: {}, calc_time: {}", io_time.as_secs(), calc_time.as_secs()).unwrap(); prints io_time: 0, calc_time: 27 . 打印io_time: 0, calc_time: 27

I tried nightly rustc 1.13.0-nightly (e9bc1bac8 2016-08-24) : 我夜间尝试每晚rustc 1.13.0-nightly (e9bc1bac8 2016-08-24)

$ time ./test_rust < data.txt  > test3_res
io_time: 0, calc_time: 19

real    0m19.592s
user    0m19.560s
sys     0m0.020s
$ time ./test1 < data.txt  > test1_res

real    0m7.797s
user    0m7.780s
sys     0m0.010s

So it is at now ~ 2.7x difference on my Core i7 . 所以它在我的Core i7上差不多是2.7倍。

The difference is of course -march=native ... kind of. 区别当然是-march=native ...有点。 Rust has this through -C target_cpu=native , but this doesn't give the same speed benefit. Rust通过-C target_cpu=native这一点,但这并没有给出相同的速度优势。 This is because LLVM is unwilling to vectorize in this context, whereas GCC does. 这是因为LLVM不愿意在这种情况下进行矢量化,而GCC不然 You may note that using Clang , a C++ compiler that also uses LLVM, also produces relatively slow code. 您可能会注意到,使用Clang (一种也使用LLVM的C ++编译器)也会产生相对较慢的代码。

To encourage LLVM to vectorize, you can move the main loop into a separate function. 为了鼓励LLVM进行矢量化,您可以将主循环移动到单独的函数中。 Alternatively, you can use a local block. 或者,您可以使用本地块。 If you write the code carefully as 如果你仔细编写代码

let dist = {
    let mut dist = i32::max_value();
    for &(a, b) in &points_a[..n] {
        for &(c, d) in &points_b[..n] {
            dist = std::cmp::min(((a - c).abs() + (b - d).abs()), dist);
        }
    }
    dist
} as u32;

the difference between Rust and C++ is then near-negligible (~4%). Rust和C ++之间的区别几乎可以忽略不计(~4%)。

The vast majority of the performance you're seeing in C++ is due to the flag -march=native . 您在C ++中看到的绝大多数性能都归功于标志-march=native

This flag is not the equivalent flag to Rust's --release . 这个标志不是Rust的--release的等效标志。 It uses CPU instructions specific to the CPU it is compiled on, so math in particular is going to be way faster. 它采用特定于它是在编译的CPU CPU指令,所以数学特别是将是方式更快。

Removing that flag puts the C++ code at 19 seconds. 删除该标志会使C ++代码处于19秒。

Then there's the unsafety present in the C++ code. 然后是C ++代码中存在的不安全现象。 None of the input is checked. 没有选中任何输入。 The Rust code does check it, you use .unwrap()unwrap has a performance cost, there's an assertion, then the code necessary for unwinding, etc. 锈病代码不检查它,你用.unwrap() - unwrap有性能上的成本,有一个说法,那么代码需要平仓等。

Using if let s instead of raw unwrap s, or ignoring results where possible, brings the Rust code down again. 使用if let而不是raw unwrap ,或者在可能的情况下忽略结果,再次使Rust代码失效。

Rust: 22 seconds 锈:22秒

C++: 19 seconds C ++:19秒

Where's the 3 seconds coming from? 3秒来自哪里? A bit of playing around leads me to believe it's println! 一点点玩耍让我相信它是println! vs. printf , but I don't have hard numbers for the C++ code. printf相比,但我没有C ++代码的硬编号。 What I can say is that the Rust code drops to 13 seconds when I perform the printing outside of the benchmark. 我可以说的是,当我在基准测试之外执行打印时,Rust代码会下降到13秒。

TLDR: Your compiler flags are different, and your C++ code is not safe. TLDR:您的编译器标志不同,您的C ++代码不安全。

I'm definitely not seeing any difference in execution time. 我绝对没有看到执行时间的任何差异。 On my machine, 在我的机器上

C++: C ++:

real    0m19.672s
user    0m19.636s
sys     0m0.060s

Rust: 锈:

real    0m19.047s
user    0m19.028s
sys     0m0.040s

I compiled the Rust code with rustc -O test.rs -o ./test and the C++ code with g++ -Ofast test.cpp -o test . 我使用rustc -O test.rs -o ./test编译Rust代码,使用g++ -Ofast test.cpp -o test C ++代码。

I'm running Ubuntu 16.04 with Linux Kernel 4.6.3-040603-generic. 我正在使用Linux Kernel 4.6.3-040603-generic运行Ubuntu 16.04。 The laptop I ran this on has an Intel(R) Core(TM) i5-6200U CPU and 8GB of RAM, nothing special. 我运行它的笔记本电脑有一个Intel(R)Core(TM)i5-6200U CPU和8GB RAM,没什么特别的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么 Rust 中的枚举值绑定这么慢? - Why enum value binding in Rust is so slow? 为什么我的 Rust 程序的运行速度是 Java 等效程序的两倍? - Why is my Rust program running more that twice as slow as the Java equivalent? 与C相比,Rust文件I / O非常慢。出了什么问题? - Rust file I/O is very slow compared with C. Is something wrong? 为什么在未标记的枚举中反序列化时 serde_json rust 这么慢 - Why serde_json rust so slow when deserializing in Untagged Enums 为什么 Rust 可执行文件如此庞大? - Why are Rust executables so huge? 我应该在哪里放置一个静态库,以便我可以将它与 Rust 程序链接? - Where should I place a static library so I can link it with a Rust program? 如何编译Rust程序,使其不使用__cxa_thread_atexit_impl? - How can I compile a Rust program so it doesn't use __cxa_thread_atexit_impl? 使用rust-websocket时如何处理错误,以便仅该连接失败而不是整个程序失败? - How do I handle an error when using rust-websocket so that only that connection fails and not the entire program? Rust vs Go并发webserver,为什么Rust在这里慢? - Rust vs Go concurrent webserver, why is Rust slow here? 我怎么知道Rust中是否有东西被初始化了? - How I can know if something is initialised in Rust?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM