简体   繁体   English

质量算法上的Scala性能

[英]Scala performance on primes algorithm

I'm quite new on Scala and so in order to start writing some code I've implemented this simple program: 我对Scala很新,所以为了开始编写一些代码,我已经实现了这个简单的程序:

package org.primes.sim

object Primes {

  def is_prime(a: Int): Boolean = {
    val l = Stream.range(3, a, 2) filter { e => a % e == 0}
    l.size == 0
  }

  def gen_primes(m: Int) = 
    2 #:: Stream.from(3, 2) filter { e => is_prime(e) } take m

  def primes(m : Int) = {
      gen_primes(m) foreach println      
  }

  def main(args: Array[String]) {
    if (args.size == 0)
      primes(10)
    else
      primes(args(0).toInt)
  }

}

It generates n primes starting from 2. Then I've implemented the same algorithm in C++11 using range-v3 library of Eric Nibler.This is the code: 它从2开始生成n个素数。然后我使用Eric Nibler的range-v3库在C ++ 11中实现了相同的算法。这是代码:

#include <iostream>
#include <vector>
#include <string>
#include <range/v3/all.hpp>

using namespace std;
using namespace ranges;

inline bool is_even(unsigned int n) { return n % 2 == 0; }

inline bool is_prime(unsigned int n)
{
    if (n == 2)
        return true;
    else if (n == 1 || is_even(n))
        return false;
    else
        return ranges::any_of(
                view::iota(3, n) | view::remove_if(is_even),
                [n](unsigned int e) { return n % e == 0; }
            ) == false;
}

void primes(unsigned int n)
{
    auto rng = view::ints(2) | view::filter(is_prime);
    ranges::for_each(view::take(rng, n), [](unsigned int e){ cout << e << '\n'; });
}

int main(int argc, char* argv[])
{
    if (argc == 1)
        primes(100);
    else if (argc > 1)
    {
        primes(std::stoi(argv[1]));
    }
}

As you can see the code looks very similar but the performance are very different: 正如您所看到的,代码看起来非常相似,但性能却截然不同:

For n = 5000, C++ completes in 0,265s instead Scala completes in 24,314s!!! 对于n = 5000,C ++在0,265s完成,而Scala在24,314s完成! So, from this test, Scala seems 100x slower than C++11. 因此,从这个测试来看,Scala似乎比C ++ 11慢100倍。

Which is the problem on Scala code? Scala代码有哪些问题? Could you give me some hints for a better usage of scalac? 你能给我一些提示,以便更好地使用scalac吗?

Note: I've compiled the C++ code using gcc 4.9.2 and -O3 opt. 注意:我使用gcc 4.9.2和-O3 opt编译了C ++代码。

Thanks 谢谢

The main speed problem lies with your is_prime implementation. 主要的速度问题在于你的is_prime实现。

First of all, you filter a Stream to find all divisors, and then check if there were none ( l.size == 0 ). 首先,您过滤一个Stream来查找所有除数,然后检查是否没有( l.size == 0 )。 But it's faster to return false as soon as the first divisor is found: 但是一旦找到第一个除数,返回false就会更快:

def is_prime(a: Int): Boolean =
  Stream.range(3, a, 2).find(a % _ == 0).isEmpty

This decreased runtime from 22 seconds to 5 seconds for primes(5000) on my machine. 在我的机器上,质量primes(5000)运行时间从22秒减少到5秒。

The second problem is Stream itself. 第二个问题是Stream本身。 Scala Streams are slow, and using them for simple number calculations is a huge overkill. Scala Streams很慢,使用它们进行简单的数字计算是一个巨大的过度杀伤力。 Replacing Stream with Range decreased runtime further to 1,2 seconds: Range替换Stream Range运行时间进一步减少到1,2秒:

def is_prime(a: Int): Boolean =
  3.until(a, 2).find(a % _ == 0).isEmpty

That's decent: 5x slower than C++. 这很不错:比C ++慢5倍。 Usually, I'd stop here, but it is possible to decrease running-time a bit more if we remove the higher-order function find . 通常,我会停在这里,但如果我们删除高阶函数find ,则可以减少运行时间。

While nice-looking and functional, find also induces some overhead. 虽然好看的和功能性, find也引起一些开销。 Loop implementation (basically replacing find with foreach ) further decreased runtime to 0,45 seconds, which is less than 2x slower than C++ (that's already on the order of JVM overhead): 循环实现(基本上用foreach替换find )进一步减少了运行时间到0.45秒,这比C ++慢了不到2倍(已经是JVM开销的顺序):

def is_prime(a: Int): Boolean = {
  for (e <- 3.until(a, 2)) if (a % e == 0) return false
  true
}

There's another Stream in gen_primes , so doing something with it may improve the run time more, but in my opinion that's not necessary. gen_primes还有另一个Stream,所以用它做一些事情可能会更多地改善运行时间,但在我看来,这并不是必需的。 At that point in performance improvement, I think it would be better to switch to some other algorithm of generating primes: eg, using only primes, instead of all odd numbers, to look for divisors, or using Sieve of Eratosthenes. 在性能改进的那一点上,我认为最好切换到其他一些生成素数的算法:例如,仅使用素数而不是所有奇数,来寻找除数,或者使用Eratosthenes的Sieve。

All in all, functional abstractions in Scala are implemented with actual objects on the heap, which have some overhead, and JIT compiler can't fix everything. 总而言之,Scala中的函数抽象是使用堆上的实际对象实现的,这些对象有一些开销,而JIT编译器无法修复所有内容。 But the selling point of C++ is zero-cost abstractions: everything that is possible is expanded during compilation through template s, constexpr and further aggressively optimized by the compiler. 但C ++的卖点是零成本抽象:在编译过程中,通过templateconstexpr以及编译器进一步积极优化,可以扩展所有可能的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM