质量算法上的Scala性能

Question

I'm quite new on Scala and so in order to start writing some code I've implemented this simple program: 我对Scala很新，所以为了开始编写一些代码，我已经实现了这个简单的程序：

package org.primes.sim

object Primes {

  def is_prime(a: Int): Boolean = {
    val l = Stream.range(3, a, 2) filter { e => a % e == 0}
    l.size == 0
  }

  def gen_primes(m: Int) = 
    2 #:: Stream.from(3, 2) filter { e => is_prime(e) } take m

  def primes(m : Int) = {
      gen_primes(m) foreach println      
  }

  def main(args: Array[String]) {
    if (args.size == 0)
      primes(10)
    else
      primes(args(0).toInt)
  }

}

It generates n primes starting from 2. Then I've implemented the same algorithm in C++11 using range-v3 library of Eric Nibler.This is the code: 它从2开始生成n个素数。然后我使用Eric Nibler的range-v3库在C ++ 11中实现了相同的算法。这是代码：

#include <iostream>
#include <vector>
#include <string>
#include <range/v3/all.hpp>

using namespace std;
using namespace ranges;

inline bool is_even(unsigned int n) { return n % 2 == 0; }

inline bool is_prime(unsigned int n)
{
    if (n == 2)
        return true;
    else if (n == 1 || is_even(n))
        return false;
    else
        return ranges::any_of(
                view::iota(3, n) | view::remove_if(is_even),
                [n](unsigned int e) { return n % e == 0; }
            ) == false;
}

void primes(unsigned int n)
{
    auto rng = view::ints(2) | view::filter(is_prime);
    ranges::for_each(view::take(rng, n), [](unsigned int e){ cout << e << '\n'; });
}

int main(int argc, char* argv[])
{
    if (argc == 1)
        primes(100);
    else if (argc > 1)
    {
        primes(std::stoi(argv[1]));
    }
}

As you can see the code looks very similar but the performance are very different: 正如您所看到的，代码看起来非常相似，但性能却截然不同：

For n = 5000, C++ completes in 0,265s instead Scala completes in 24,314s!!! 对于n = 5000，C ++在0,265s完成，而Scala在24,314s完成！ So, from this test, Scala seems 100x slower than C++11. 因此，从这个测试来看，Scala似乎比C ++ 11慢100倍。

Which is the problem on Scala code? Scala代码有哪些问题？ Could you give me some hints for a better usage of scalac? 你能给我一些提示，以便更好地使用scalac吗？

Note: I've compiled the C++ code using gcc 4.9.2 and -O3 opt. 注意：我使用gcc 4.9.2和-O3 opt编译了C ++代码。

Thanks 谢谢

Answer 1

The main speed problem lies with your is_prime implementation. 主要的速度问题在于你的is_prime实现。

First of all, you filter a Stream to find all divisors, and then check if there were none ( l.size == 0 ). 首先，您过滤一个Stream来查找所有除数，然后检查是否没有（ l.size == 0 ）。 But it's faster to return false as soon as the first divisor is found: 但是一旦找到第一个除数，返回false就会更快：

def is_prime(a: Int): Boolean =
  Stream.range(3, a, 2).find(a % _ == 0).isEmpty

This decreased runtime from 22 seconds to 5 seconds for primes(5000) on my machine. 在我的机器上，质量primes(5000)运行时间从22秒减少到5秒。

The second problem is Stream itself. 第二个问题是Stream本身。 Scala Streams are slow, and using them for simple number calculations is a huge overkill. Scala Streams很慢，使用它们进行简单的数字计算是一个巨大的过度杀伤力。 Replacing Stream with Range decreased runtime further to 1,2 seconds: 用Range替换Stream Range运行时间进一步减少到1,2秒：

def is_prime(a: Int): Boolean =
  3.until(a, 2).find(a % _ == 0).isEmpty

That's decent: 5x slower than C++. 这很不错：比C ++慢5倍。 Usually, I'd stop here, but it is possible to decrease running-time a bit more if we remove the higher-order function find . 通常，我会停在这里，但如果我们删除高阶函数find ，则可以减少运行时间。

While nice-looking and functional, find also induces some overhead. 虽然好看的和功能性， find也引起一些开销。 Loop implementation (basically replacing find with foreach ) further decreased runtime to 0,45 seconds, which is less than 2x slower than C++ (that's already on the order of JVM overhead): 循环实现（基本上用foreach替换find ）进一步减少了运行时间到0.45秒，这比C ++慢了不到2倍（已经是JVM开销的顺序）：

def is_prime(a: Int): Boolean = {
  for (e <- 3.until(a, 2)) if (a % e == 0) return false
  true
}

There's another Stream in gen_primes , so doing something with it may improve the run time more, but in my opinion that's not necessary. gen_primes还有另一个Stream，所以用它做一些事情可能会更多地改善运行时间，但在我看来，这并不是必需的。 At that point in performance improvement, I think it would be better to switch to some other algorithm of generating primes: eg, using only primes, instead of all odd numbers, to look for divisors, or using Sieve of Eratosthenes. 在性能改进的那一点上，我认为最好切换到其他一些生成素数的算法：例如，仅使用素数而不是所有奇数，来寻找除数，或者使用Eratosthenes的Sieve。

All in all, functional abstractions in Scala are implemented with actual objects on the heap, which have some overhead, and JIT compiler can't fix everything. 总而言之，Scala中的函数抽象是使用堆上的实际对象实现的，这些对象有一些开销，而JIT编译器无法修复所有内容。 But the selling point of C++ is zero-cost abstractions: everything that is possible is expanded during compilation through template s, constexpr and further aggressively optimized by the compiler. 但C ++的卖点是零成本抽象：在编译过程中，通过template ， constexpr以及编译器进一步积极优化，可以扩展所有可能的东西。

质量算法上的Scala性能

问题描述

1 个解决方案

解决方案1
23 已采纳 2015-05-03 12:34:21

质量算法上的Scala性能

问题描述

1 个解决方案

解决方案1 23 已采纳 2015-05-03 12:34:21

解决方案1
23 已采纳 2015-05-03 12:34:21