[英]Why is this Rust program so slow? Did I miss something?
I read Minimal distance in Manhattan metric and rewrote the author's "naive" implementation in Rust . 我读了曼哈顿的最小距离度量,并重写了作者在Rust中的“天真”实现。 The C++ variant is: C ++变体是:
#include <utility>
#include <cstdio>
#include <cstdlib>
std::pair<int, int> pointsA[1000001];
std::pair<int, int> pointsB[1000001];
int main() {
int n, t;
unsigned long long dist;
scanf("%d", &t);
while(t-->0) {
dist = 4000000000LL;
scanf("%d", &n);
for(int i = 0; i < n; i++) {
scanf("%d%d", &pointsA[i].first, &pointsA[i].second);
}
for(int i = 0; i < n; i++) {
scanf("%d%d", &pointsB[i].first, &pointsB[i].second);
}
for(int i = 0; i < n ;i++) {
for(int j = 0; j < n ; j++) {
if(abs(pointsA[i].first - pointsB[j].first) + abs(pointsA[i].second - pointsB[j].second) < dist)
dist = abs(pointsA[i].first - pointsB[j].first) + abs(pointsA[i].second - pointsB[j].second);
}
}
printf("%lld\n", dist);
}
}
The Rust variant is: Rust变种是:
use std::io;
use std::io::BufReader;
use std::io::BufRead;
fn read_array(stdin: &mut BufReader<io::Stdin>, array_len: usize, points: &mut Vec<(i32, i32)>) {
let mut line = String::new();
for _ in 0..array_len {
line.clear();
stdin.read_line(&mut line).unwrap();
let mut item = line.split_whitespace();
let x = item.next().unwrap().parse().unwrap();
let y = item.next().unwrap().parse().unwrap();
points.push((x, y));
}
}
fn manhattan_dist(a: &(i32, i32), b: &(i32, i32)) -> u32 {
((a.0 - b.0).abs() + (a.1 - b.1).abs()) as u32
}
fn main() {
let mut line = String::new();
let mut stdin = BufReader::new(io::stdin());
stdin.read_line(&mut line).unwrap();
let n_iters = line.trim_right().parse::<usize>().unwrap();
let mut points_a = Vec::with_capacity(10000);
let mut points_b = Vec::with_capacity(10000);
for _ in 0..n_iters {
line.clear();
stdin.read_line(&mut line).unwrap();
let set_len = line.trim_right().parse::<usize>().unwrap();
points_a.clear();
points_b.clear();
read_array(&mut stdin, set_len, &mut points_a);
read_array(&mut stdin, set_len, &mut points_b);
let mut dist = u32::max_value();
for i in points_a.iter() {
for j in points_b.iter() {
dist = std::cmp::min(manhattan_dist(i, j), dist);
}
}
println!("{}", dist);
}
}
Then, I generated data with a Python script: 然后,我使用Python脚本生成数据:
import random
ITER = 100
N = 10000
MAX_INT = 1000000
print("%d" % ITER)
for _ in range(0, ITER):
print("%d" % N)
for _ in range(0, N):
print(random.randrange(-MAX_INT, MAX_INT + 1), random.randrange(1, MAX_INT + 1))
for _ in range(0, N):
print(random.randrange(-MAX_INT, MAX_INT + 1), random.randrange(-MAX_INT, 0))
And compiled both variants with g++ -Ofast -march=native
and rustc -C opt-level=3
respectively. 并使用g++ -Ofast -march=native
和rustc -C opt-level=3
分别编译了两个变体。 The timings are: 时间是:
C++ C ++
real 0m7.789s
user 0m7.760s
sys 0m0.020s
Rust 锈
real 0m28.589s
user 0m28.570s
sys 0m0.010s
Why is my Rust code four times slower than the C++ variant? 为什么我的Rust代码比C ++变体慢四倍? I am using Rust 1.12.0-beta.1. 我正在使用Rust 1.12.0-beta.1。
I added time measurements: 我添加了时间测量:
let now = SystemTime::now();
line.clear();
stdin.read_line(&mut line).unwrap();
let set_len = line.trim_right().parse::<usize>().unwrap();
points_a.clear();
points_b.clear();
read_array(&mut stdin, set_len, &mut points_a);
read_array(&mut stdin, set_len, &mut points_b);
io_time += now.elapsed().unwrap();
let now = SystemTime::now();
let mut dist = u32::max_value();
for i in points_a.iter() {
for j in points_b.iter() {
dist = std::cmp::min(manhattan_dist(i, j), dist);
}
}
calc_time += now.elapsed().unwrap();
And writeln!(&mut std::io::stderr(), "io_time: {}, calc_time: {}", io_time.as_secs(), calc_time.as_secs()).unwrap();
并且writeln!(&mut std::io::stderr(), "io_time: {}, calc_time: {}", io_time.as_secs(), calc_time.as_secs()).unwrap();
prints io_time: 0, calc_time: 27
. 打印io_time: 0, calc_time: 27
。
I tried nightly rustc 1.13.0-nightly (e9bc1bac8 2016-08-24)
: 我夜间尝试每晚rustc 1.13.0-nightly (e9bc1bac8 2016-08-24)
:
$ time ./test_rust < data.txt > test3_res
io_time: 0, calc_time: 19
real 0m19.592s
user 0m19.560s
sys 0m0.020s
$ time ./test1 < data.txt > test1_res
real 0m7.797s
user 0m7.780s
sys 0m0.010s
So it is at now ~ 2.7x difference on my Core i7 . 所以它在我的Core i7上差不多是2.7倍。
The difference is of course -march=native
... kind of. 区别当然是-march=native
...有点。 Rust has this through -C target_cpu=native
, but this doesn't give the same speed benefit. Rust通过-C target_cpu=native
这一点,但这并没有给出相同的速度优势。 This is because LLVM is unwilling to vectorize in this context, whereas GCC does. 这是因为LLVM不愿意在这种情况下进行矢量化,而GCC则不然 。 You may note that using Clang , a C++ compiler that also uses LLVM, also produces relatively slow code. 您可能会注意到,使用Clang (一种也使用LLVM的C ++编译器)也会产生相对较慢的代码。
To encourage LLVM to vectorize, you can move the main loop into a separate function. 为了鼓励LLVM进行矢量化,您可以将主循环移动到单独的函数中。 Alternatively, you can use a local block. 或者,您可以使用本地块。 If you write the code carefully as 如果你仔细编写代码
let dist = {
let mut dist = i32::max_value();
for &(a, b) in &points_a[..n] {
for &(c, d) in &points_b[..n] {
dist = std::cmp::min(((a - c).abs() + (b - d).abs()), dist);
}
}
dist
} as u32;
the difference between Rust and C++ is then near-negligible (~4%). Rust和C ++之间的区别几乎可以忽略不计(~4%)。
The vast majority of the performance you're seeing in C++ is due to the flag -march=native
. 您在C ++中看到的绝大多数性能都归功于标志-march=native
。
This flag is not the equivalent flag to Rust's --release
. 这个标志不是Rust的--release
的等效标志。 It uses CPU instructions specific to the CPU it is compiled on, so math in particular is going to be way faster. 它采用特定于它是在编译的CPU CPU指令,所以数学特别是将是方式更快。
Removing that flag puts the C++ code at 19 seconds. 删除该标志会使C ++代码处于19秒。
Then there's the unsafety present in the C++ code. 然后是C ++代码中存在的不安全现象。 None of the input is checked. 没有选中任何输入。 The Rust code does check it, you use .unwrap()
– unwrap
has a performance cost, there's an assertion, then the code necessary for unwinding, etc. 锈病代码不检查它,你用.unwrap()
- unwrap
有性能上的成本,有一个说法,那么代码需要平仓等。
Using if let
s instead of raw unwrap
s, or ignoring results where possible, brings the Rust code down again. 使用if let
而不是raw unwrap
,或者在可能的情况下忽略结果,再次使Rust代码失效。
Rust: 22 seconds 锈:22秒
C++: 19 seconds C ++:19秒
Where's the 3 seconds coming from? 3秒来自哪里? A bit of playing around leads me to believe it's println!
一点点玩耍让我相信它是println!
vs. printf
, but I don't have hard numbers for the C++ code. 与printf
相比,但我没有C ++代码的硬编号。 What I can say is that the Rust code drops to 13 seconds when I perform the printing outside of the benchmark. 我可以说的是,当我在基准测试之外执行打印时,Rust代码会下降到13秒。
TLDR: Your compiler flags are different, and your C++ code is not safe. TLDR:您的编译器标志不同,您的C ++代码不安全。
I'm definitely not seeing any difference in execution time. 我绝对没有看到执行时间的任何差异。 On my machine, 在我的机器上
C++: C ++:
real 0m19.672s
user 0m19.636s
sys 0m0.060s
Rust: 锈:
real 0m19.047s
user 0m19.028s
sys 0m0.040s
I compiled the Rust code with rustc -O test.rs -o ./test
and the C++ code with g++ -Ofast test.cpp -o test
. 我使用rustc -O test.rs -o ./test
编译Rust代码,使用g++ -Ofast test.cpp -o test
C ++代码。
I'm running Ubuntu 16.04 with Linux Kernel 4.6.3-040603-generic. 我正在使用Linux Kernel 4.6.3-040603-generic运行Ubuntu 16.04。 The laptop I ran this on has an Intel(R) Core(TM) i5-6200U CPU and 8GB of RAM, nothing special. 我运行它的笔记本电脑有一个Intel(R)Core(TM)i5-6200U CPU和8GB RAM,没什么特别的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.