[英]Scala performance on primes algorithm
I'm quite new on Scala and so in order to start writing some code I've implemented this simple program: 我对Scala很新,所以为了开始编写一些代码,我已经实现了这个简单的程序:
package org.primes.sim
object Primes {
def is_prime(a: Int): Boolean = {
val l = Stream.range(3, a, 2) filter { e => a % e == 0}
l.size == 0
}
def gen_primes(m: Int) =
2 #:: Stream.from(3, 2) filter { e => is_prime(e) } take m
def primes(m : Int) = {
gen_primes(m) foreach println
}
def main(args: Array[String]) {
if (args.size == 0)
primes(10)
else
primes(args(0).toInt)
}
}
It generates n primes starting from 2. Then I've implemented the same algorithm in C++11 using range-v3 library of Eric Nibler.This is the code: 它从2开始生成n个素数。然后我使用Eric Nibler的range-v3库在C ++ 11中实现了相同的算法。这是代码:
#include <iostream>
#include <vector>
#include <string>
#include <range/v3/all.hpp>
using namespace std;
using namespace ranges;
inline bool is_even(unsigned int n) { return n % 2 == 0; }
inline bool is_prime(unsigned int n)
{
if (n == 2)
return true;
else if (n == 1 || is_even(n))
return false;
else
return ranges::any_of(
view::iota(3, n) | view::remove_if(is_even),
[n](unsigned int e) { return n % e == 0; }
) == false;
}
void primes(unsigned int n)
{
auto rng = view::ints(2) | view::filter(is_prime);
ranges::for_each(view::take(rng, n), [](unsigned int e){ cout << e << '\n'; });
}
int main(int argc, char* argv[])
{
if (argc == 1)
primes(100);
else if (argc > 1)
{
primes(std::stoi(argv[1]));
}
}
As you can see the code looks very similar but the performance are very different: 正如您所看到的,代码看起来非常相似,但性能却截然不同:
For n = 5000, C++ completes in 0,265s instead Scala completes in 24,314s!!! 对于n = 5000,C ++在0,265s完成,而Scala在24,314s完成! So, from this test, Scala seems 100x slower than C++11. 因此,从这个测试来看,Scala似乎比C ++ 11慢100倍。
Which is the problem on Scala code? Scala代码有哪些问题? Could you give me some hints for a better usage of scalac? 你能给我一些提示,以便更好地使用scalac吗?
Note: I've compiled the C++ code using gcc 4.9.2 and -O3 opt. 注意:我使用gcc 4.9.2和-O3 opt编译了C ++代码。
Thanks 谢谢
The main speed problem lies with your is_prime
implementation. 主要的速度问题在于你的is_prime
实现。
First of all, you filter a Stream to find all divisors, and then check if there were none ( l.size == 0
). 首先,您过滤一个Stream来查找所有除数,然后检查是否没有( l.size == 0
)。 But it's faster to return false
as soon as the first divisor is found: 但是一旦找到第一个除数,返回false
就会更快:
def is_prime(a: Int): Boolean =
Stream.range(3, a, 2).find(a % _ == 0).isEmpty
This decreased runtime from 22 seconds to 5 seconds for primes(5000)
on my machine. 在我的机器上,质量primes(5000)
运行时间从22秒减少到5秒。
The second problem is Stream
itself. 第二个问题是Stream
本身。 Scala Streams are slow, and using them for simple number calculations is a huge overkill. Scala Streams很慢,使用它们进行简单的数字计算是一个巨大的过度杀伤力。 Replacing Stream
with Range
decreased runtime further to 1,2 seconds: 用Range
替换Stream
Range
运行时间进一步减少到1,2秒:
def is_prime(a: Int): Boolean =
3.until(a, 2).find(a % _ == 0).isEmpty
That's decent: 5x slower than C++. 这很不错:比C ++慢5倍。 Usually, I'd stop here, but it is possible to decrease running-time a bit more if we remove the higher-order function find
. 通常,我会停在这里,但如果我们删除高阶函数find
,则可以减少运行时间。
While nice-looking and functional, find
also induces some overhead. 虽然好看的和功能性, find
也引起一些开销。 Loop implementation (basically replacing find
with foreach
) further decreased runtime to 0,45 seconds, which is less than 2x slower than C++ (that's already on the order of JVM overhead): 循环实现(基本上用foreach
替换find
)进一步减少了运行时间到0.45秒,这比C ++慢了不到2倍(已经是JVM开销的顺序):
def is_prime(a: Int): Boolean = {
for (e <- 3.until(a, 2)) if (a % e == 0) return false
true
}
There's another Stream in gen_primes
, so doing something with it may improve the run time more, but in my opinion that's not necessary. gen_primes
还有另一个Stream,所以用它做一些事情可能会更多地改善运行时间,但在我看来,这并不是必需的。 At that point in performance improvement, I think it would be better to switch to some other algorithm of generating primes: eg, using only primes, instead of all odd numbers, to look for divisors, or using Sieve of Eratosthenes. 在性能改进的那一点上,我认为最好切换到其他一些生成素数的算法:例如,仅使用素数而不是所有奇数,来寻找除数,或者使用Eratosthenes的Sieve。
All in all, functional abstractions in Scala are implemented with actual objects on the heap, which have some overhead, and JIT compiler can't fix everything. 总而言之,Scala中的函数抽象是使用堆上的实际对象实现的,这些对象有一些开销,而JIT编译器无法修复所有内容。 But the selling point of C++ is zero-cost abstractions: everything that is possible is expanded during compilation through template
s, constexpr
and further aggressively optimized by the compiler. 但C ++的卖点是零成本抽象:在编译过程中,通过template
, constexpr
以及编译器进一步积极优化,可以扩展所有可能的东西。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.