简体   繁体   English

Java 与 Rust 性能对比

[英]Java vs Rust performance

I ran a small identical benchmark on both Java and Rust.我在 Java 和 Rust 上运行了一个小的相同基准。

Java: Java:

public class Main {
    private static final int NUM_ITERS = 100;

    public static void main(String[] args) {
        long tInit = System.nanoTime();
        int c = 0;

        for (int i = 0; i < NUM_ITERS; ++i) {
            for (int j = 0; j < NUM_ITERS; ++j) {
                for (int k = 0; k < NUM_ITERS; ++k) {
                    if (i*i + j*j == k*k) {
                        ++c;
                        System.out.println(i + " " + j + " " + k);
                    }
                }
            }
        }

        System.out.println(c);
        System.out.println(System.nanoTime() - tInit);
    }
}

Rust: Rust:

use std::time::SystemTime;

const NUM_ITERS: i32 = 100;

fn main() {
    let t_init = SystemTime::now();
    let mut c = 0;

    for i in 0..NUM_ITERS {
        for j in 0..NUM_ITERS {
            for k in 0..NUM_ITERS {
                if i*i + j*j == k*k {
                    c += 1;
                    println!("{} {} {}", i, j, k);
                }
            }
        }
    }

    println!("{}", c);
    println!("{}", t_init.elapsed().unwrap().as_nanos());
}

When NUM_ITERS = 100 , as expected, Rust out-performed JavaNUM_ITERS = 100时,正如预期的那样,Rust 的表现优于 Java

Java: 59311348 ns
Rust: 29629242 ns

But for NUM_ITERS = 1000 , I saw that Rust took much longer and Java was way faster但是对于NUM_ITERS = 1000 ,我看到 Rust 花费了更长的时间,而 Java 更快了

Java: 1585835361  ns
Rust: 28623818145 ns

What could be the reason for this?这可能是什么原因? Shouldn't Rust perform better than Java in this case too?在这种情况下,Rust 是否也应该比 Java 表现更好? Or is it because I have made some mistake in the implementation?还是因为我在执行中犯了一些错误?

Update更新

I removed the lines System.out.println(i + " " + j + " " + k);我删除了System.out.println(i + " " + j + " " + k); and println,("{} {} {}", i, j; k);println,("{} {} {}", i, j; k); from the codes.从代码。 And here are the outputs这是输出

NUM_ITERS = 100
Java: 3843114  ns
Rust: 29072345 ns


NUM_ITERS = 1000
Java: 1014829974  ns
Rust: 28402166953 ns

So, without the println statements, Java performs better than Rust in both cases.因此,在没有println语句的情况下,Java 在这两种情况下的性能都比 Rust 好。 I simply want to know that why that is the case.我只是想知道为什么会这样。 Java has the Garbage Collector running and other overheads. Java 有垃圾收集器运行和其他开销。 Have I not implemented the loops in Rust optimally?我是否没有最佳地实现 Rust 中的循环?

I adjusted your code to eliminate the points of criticism laid out in the comments.我调整了您的代码以消除评论中提出的批评点。 Not compiling Rust for production is the biggest problem, that introduces a 50x overhead.不为生产编译 Rust 是最大的问题,它引入了 50 倍的开销。 Beyond that, I eliminated printing while measuring, and did proper warming up of the Java code.除此之外,我在测量时取消了打印,并对 Java 代码进行了适当的预热。

I would say that Java and Rust were on par after these changes, they are within 2x of each other and both have very low cost per iteration (just a fraction of a nanosecond).我想说的是 Java 和 Rust 在这些更改之后是相当的,它们彼此相差 2 倍以内,并且每次迭代的成本都非常低(只有几分之一纳秒)。

Here is my code:这是我的代码:

public class Testing {
    private static final int NUM_ITERS = 1_000;
    private static final int MEASURE_TIMES = 7;

    public static void main(String[] args) {
        for (int i = 0; i < MEASURE_TIMES; i++) {
            System.out.format("%.2f ns per iteration%n", benchmark());
        }
    }

    private static double benchmark() {
        long tInit = System.nanoTime();
        int c = 0;
        for (int i = 0; i < NUM_ITERS; ++i) {
            for (int j = 0; j < NUM_ITERS; ++j) {
                for (int k = 0; k < NUM_ITERS; ++k) {
                    if (i*i + j*j == k*k) {
                        ++c;
                    }
                }
            }
        }
        if (c % 137 == 0) {
            // Use c so its computation can't be elided
            System.out.println("Count is divisible by 13: " + c);
        }
        long tookNanos = System.nanoTime() - tInit;
        return tookNanos / ((double) NUM_ITERS * NUM_ITERS * NUM_ITERS);
    }
}
use std::time::SystemTime;

const NUM_ITERS: i32 = 1000;

fn main() {
    let mut c = 0;

    let t_init = SystemTime::now();
    for i in 0..NUM_ITERS {
        for j in 0..NUM_ITERS {
            for k in 0..NUM_ITERS {
                if i*i + j*j == k*k {
                    c += 1;
                }
            }
        }
    }
    let took_ns = t_init.elapsed().unwrap().as_nanos() as f64;

    let iters = NUM_ITERS as f64;
    println!("{} ns per iteration", took_ns / (iters * iters * iters));
    // Use c to ensure its computation can't be elided by the optimizer
    if c % 137 == 0 {
        println!("Count is divisible by 137: {}", c);
    }
}

I run Java from IntelliJ, with JDK 16. I run Rust from the command line, using cargo run --release .我使用 JDK 16 从 IntelliJ 运行 Java。我使用cargo run --release

Example of Java output: Java output 示例:

0.98 ns per iteration
0.93 ns per iteration
0.32 ns per iteration
0.34 ns per iteration
0.32 ns per iteration
0.33 ns per iteration
0.32 ns per iteration

Example of Rust output: Rust output 示例:

0.600314 ns per iteration

While I'm not necessarily surprised to see Java giving a better result (its JIT compiler has been optimized for 20 years now and there's no object allocation, so no GC), I was puzzled at the overall low cost of an iteration.虽然看到 Java 给出了更好的结果,我并不一定感到惊讶(它的 JIT 编译器已经优化了 20 年,没有 object 分配,所以没有 GC),我对迭代的整体低成本感到困惑。 We can assume the expression i*i + j*j to be hoisted out of the inner loop, which leaves just k*k inside it.我们可以假设表达式i*i + j*j被提升到内部循环之外,它只留下k*k在里面。

I used a disassembler to check out the code Rust produced.我使用反汇编程序检查了生成的代码 Rust。 It definitely involves IMUL in the innermost loop.它肯定在最内层循环中涉及 IMUL。 I read this answer, which says Intel has a latency of just 3 CPU cycles for an IMUL instruction.我读了这个答案,它说英特尔的 IMUL 指令延迟只有 3 个 CPU 周期。 Combine that with multiple ALUs and instruction parallelism, and the result of 1 cycle per iteration becomes more plausible.将其与多个 ALU 和指令并行性相结合,每次迭代 1 个周期的结果变得更加合理。

Another interesting thing I discovered is that, if I just check c % 137 == 0 but don't print the actual value of c in the Rust println!我发现的另一个有趣的事情是,如果我只检查c % 137 == 0但不打印println!的实际值c打印statement, (only print "Count is divisible by 137"), iteration cost drops to just 0.26 ns.声明,(仅打印“计数可被 137 整除”),迭代成本降至仅 0.26 ns。 So Rust was able to eliminate a lot of work from the loop when I didn't ask for the exact value of c .因此,当我没有询问 c 的确切值时, c能够消除循环中的大量工作。


UPDATE更新

As discussed in the comments with @trentci, I mimicked the Java code more completely, adding an outer loop that repeats the measurement, which is now in a separate function:正如@trentci 的评论中所讨论的,我更完整地模仿了 Java 代码,添加了一个重复测量的外部循环,现在它位于一个单独的 function 中:

use std::time::SystemTime;

const NUM_ITERS: i32 = 1000;
const MEASURE_TIMES: i32 = 7;

fn main() {
    let total_iters: f64 = NUM_ITERS as f64 * NUM_ITERS as f64 * NUM_ITERS as f64;
    for _ in 0..MEASURE_TIMES {
        let took_ns = benchmark() as f64;
        println!("{} ns per iteration", took_ns / total_iters);
    }
}

fn benchmark() -> u128 {
    let mut c = 0;

    let t_init = SystemTime::now();
    for i in 0..NUM_ITERS {
        for j in 0..NUM_ITERS {
            for k in 0..NUM_ITERS {
                if i*i + j*j == k*k {
                    c += 1;
                }
            }
        }
    }
    // Use c to ensure its computation can't be elided by the optimizer
    if c % 137 == 0 {
        println!("Count is divisible by 137: {}", c);
    }
    return t_init.elapsed().unwrap().as_nanos();
}

Now I'm getting this output:现在我得到这个 output:

0.781475 ns per iteration
0.760657 ns per iteration
0.783821 ns per iteration
0.777313 ns per iteration
0.766473 ns per iteration
0.774042 ns per iteration
0.766718 ns per iteration

Another subtle change to the code that resulted in a significant change in performance.代码的另一个细微变化导致了性能的显着变化。 However, it also shows a key advantage of Rust over Java: there is no warmup needed to get the optimum performance.然而,它也显示了 Rust 优于 Java 的关键优势:无需预热即可获得最佳性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM