为什么Scala的尾递归慢于Java？

Question

Scala code for simple addition using tail recursion 使用尾递归进行简单加法的Scala代码

def add(list : List[Int],sum:Int):Int = {
    //Thread.dumpStack()
    if (list.isEmpty) {
        sum
    } else {
        val headVal  = list.head
        add(list.tail, sum + headVal)
    }
}

Below is the java code for addition in recursive mode. 下面是递归模式下添加的java代码。

public static int add(List<Integer> list, Integer sum) {
    // Thread.dumpStack();
    if (list.isEmpty()) {
        return sum;
    } else {
        int headVal = list.remove(0);
        return add(list, sum + headVal);
    }
}

The Java version runs at least 10 times faster. Java版本运行速度至少快10倍。 Ran this for 1000 entries. 这是1000条目。 Measured time by using System.nanoTime() API before and after. 通过使用System.nanoTime() API之前和之后的测量时间。

Scala version 2.10, java version Java 7. Same JVM properties for both run. Scala版本2.10，Java版本Java 7.两个运行的JVM属性相同。

Answer 1

First of all the Scala method add you have shown is not in a (class) context. 首先斯卡拉方法add你表现出不是一个（类）上下文。 If you have this method in a class, the tail recursion optimization cannot be applied, since the method is neither final nor private . 如果在类中使用此方法， 则无法应用尾递归优化，因为该方法既不是final也不是private 。 If you add @tailrec , compilation will fail. 如果添加@tailrec ，编译将失败。 If I run it with 10000, it results in a stack overflow. 如果我用10000运行它，它会导致堆栈溢出。

As for the Java version: The Java version is using a mutable List : The head/tail decomposition alters the underlying list. 至于Java版本：Java版本使用可变 List ：头/尾分解改变了基础列表。 So after summing you can't use that list any longer, since it is empty. 因此，在总结之后，您不能再使用该列表，因为它是空的。

Further the List in Scala has a completely different meaning than a Java List ; 此外，Scala中的List与Java List具有完全不同的含义; the Scala list is meant for head/tail decomposition. Scala列表用于头/尾分解。 As far as I know java.util.List has no tail method, and neither does the Java compiler apply tailrec optimizations, so the comparision is "unfair". 据我所知java.util.List没有tail方法，Java编译器也没有应用tailrec优化，因此比较是“不公平的”。

Anyway, I have run some JMH-based tests on different scenarios. 无论如何，我已经在不同的场景上运行了一些基于JMH的测试。

The only two scenarios you can really compare are "Scala while" and "Java for". 您可以真正比较的唯一两个场景是“Scala while”和“Java for”。 They neither use OOP nor functional programming, it's just imperative. 他们既不使用OOP也不使用函数编程，这只是必要的。

The results of five different Scala scenarios 五种不同Scala场景的结果

(Please scroll to the right, there is a small description in the last column) （请滚动到右侧，最后一栏中有一个小描述）

Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.s.b.Benchmark5a.run    thrpt        10      238,515        7,769   ops/ms Like in the question, but with @tailrec
a.b.s.b.Benchmark5b.run    thrpt        10      130,202        2,232   ops/ms Using List.sum
a.b.s.b.Benchmark5c.run    thrpt        10     2756,132       29,920   ops/ms while (no List, vars, imperative)
a.b.s.b.Benchmark5d.run    thrpt        10      237,286        2,203   ops/ms tailrec version with pattern matching
a.b.s.b.Benchmark5e.run    thrpt        10      214,719        2,483   ops/ms Like in the question (= no tailrec opt)

5a and 5e are like in the question; 5a和5e在问题中是相似的; 5a with @tailrec . 5a与@tailrec 。
5b: List.sum : Was very slow 5b： List.sum ：非常慢
5c: Not a big surprise, the imperative version is the fastest one. 5c：不是一个大惊喜，命令式版本是最快的版本。
5d uses pattern matching instead of if (that would have been "my style"), I am adding the source of this one: 5d使用模式匹配而不是if （那将是“我的风格”），我添加了这个的来源：

package app.benchmark.scala.benchmark5

import scala.annotation._
import org.openjdk.jmh.annotations.GenerateMicroBenchmark
import org.openjdk.jmh.annotations.Scope
import org.openjdk.jmh.annotations.State
import org.openjdk.jmh.runner.Runner
import org.openjdk.jmh.runner.RunnerException
import org.openjdk.jmh.runner.options.Options
import org.openjdk.jmh.runner.options.OptionsBuilder

@State(Scope.Benchmark)
object BenchmarkState5d {
  val list = List.range(1, 1000)
}

class Benchmark5d {
  private def add(list : List[Int]): Int = {
    @tailrec
    def add(list : List[Int], sum: Int): Int = {
      list match {
        case Nil =>
          sum
        case h :: t =>
          add(t, h + sum)
      }
    }
    add(list, 0)
  }

  @GenerateMicroBenchmark
  def run() = {
    add(BenchmarkState5d.list)
  }
}

The three Java scenarios 三种Java场景

Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.j.b.Benchmark5a.run    thrpt        10       40,437        0,532   ops/ms mutable (rebuilds the list in each iteration)
a.b.j.b.Benchmark5b.run    thrpt        10        0,450        0,008   ops/ms subList
a.b.j.b.Benchmark5c.run    thrpt        10     2735,951       29,177   ops/ms for

If you really want to compare in the meaning of functional programming style (= immutable, tail recursion, head/tail decomposition), than the Java version is five times slower. 如果你真的想要在函数式编程风格（= immutable，tail recursion，head / tail分解）的意义上进行比较，那么Java版本要慢五倍。

As pointed out in a comment by Marko Topolnik: 正如Marko Topolnik在评论中所指出的那样：

subList doesn't copy the tail, but it does something comparably bad, when applied to a LinkedList : it wraps the original list and uses an offset to accomodate the semantics. subList不会复制尾部，但是当它应用于LinkedList时它会做一些比较糟糕的事情：它包装原始列表并使用偏移量来容纳语义。 The result is that an O(n) recursive algorithm becomes O(n2)---the same as if the tail was repeatedly copied. 结果是O（n）递归算法变为O（n2）---就像重复复制尾部一样。 Plus, wrappers accrete so you end up with the list wrapped a thousand times. 另外，包装器会增加，因此您最终会将列表包裹一千次。 Definitely not comparable to a head/tail list 绝对不能与头/尾名单相比

public class Benchmark5a {
    public static int add(List<Integer> list, Integer sum) {
        if (list.isEmpty()) {
            return sum;
        } else {
            int headVal = list.remove(0);
            return add(list, sum + headVal);
        }
    }

    @GenerateMicroBenchmark
    public long run() {
        final List<Integer> list = new LinkedList<Integer>();
        for(int i = 0; i < 1000; i++) {
            list.add(i);
        }
        return add(list, 0);
    }

    public static void main(String[] args) {
        System.out.println(new Benchmark5a().run());
    }
}

@State(Scope.Benchmark)
class BenchmarkState5b {
    public final static List<Integer> list = new LinkedList<Integer>();

    static {
        for(int i = 0; i < 1000; i++) {
            list.add(i);
        }
    }
}

public class Benchmark5b {
    public static int add(List<Integer> list, int sum) {
        if (list.isEmpty()) {
            return sum;
        } else {
            int headVal = list.get(0);
            return add(list.subList(1, list.size()), sum + headVal);
        }
    }

    @GenerateMicroBenchmark
    public long run() {
        return add(BenchmarkState5b.list, 0);
    }

    public static void main(String[] args) {
        System.out.println(new Benchmark5b().run());
    }
}

Scala results in detail Scala结果很详细

(all results only show the last scenario, and the overall results) （所有结果仅显示最后一个场景，以及整体结果）

[...]

# VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java
# VM options: <none>
# Fork: 1 of 1
# Warmup: 3 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: app.benchmark.scala.benchmark5.Benchmark5e.run
# Warmup Iteration   1: 166,153 ops/ms
# Warmup Iteration   2: 215,242 ops/ms
# Warmup Iteration   3: 216,632 ops/ms
Iteration   1: 215,526 ops/ms
Iteration   2: 213,720 ops/ms
Iteration   3: 213,967 ops/ms
Iteration   4: 215,468 ops/ms
Iteration   5: 216,247 ops/ms
Iteration   6: 217,514 ops/ms
Iteration   7: 215,503 ops/ms
Iteration   8: 211,969 ops/ms
Iteration   9: 212,989 ops/ms
Iteration  10: 214,291 ops/ms

Result : 214,719 ±(99.9%) 2,483 ops/ms
  Statistics: (min, avg, max) = (211,969, 214,719, 217,514), stdev = 1,642
  Confidence interval (99.9%): [212,236, 217,202]


Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.s.b.Benchmark5a.run    thrpt        10      238,515        7,769   ops/ms
a.b.s.b.Benchmark5b.run    thrpt        10      130,202        2,232   ops/ms
a.b.s.b.Benchmark5c.run    thrpt        10     2756,132       29,920   ops/ms
a.b.s.b.Benchmark5d.run    thrpt        10      237,286        2,203   ops/ms
a.b.s.b.Benchmark5e.run    thrpt        10      214,719        2,483   ops/ms

Java results in detail Java结果详细

# VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java
# VM options: <none>
# Fork: 1 of 1
# Warmup: 3 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: app.benchmark.java.benchmark5.Benchmark5c.run
# Warmup Iteration   1: 2777,495 ops/ms
# Warmup Iteration   2: 2888,040 ops/ms
# Warmup Iteration   3: 2692,851 ops/ms
Iteration   1: 2737,169 ops/ms
Iteration   2: 2745,368 ops/ms
Iteration   3: 2754,105 ops/ms
Iteration   4: 2706,131 ops/ms
Iteration   5: 2721,593 ops/ms
Iteration   6: 2769,261 ops/ms
Iteration   7: 2734,461 ops/ms
Iteration   8: 2741,494 ops/ms
Iteration   9: 2740,012 ops/ms
Iteration  10: 2709,915 ops/ms

Result : 2735,951 ±(99.9%) 29,177 ops/ms
  Statistics: (min, avg, max) = (2706,131, 2735,951, 2769,261), stdev = 19,299
  Confidence interval (99.9%): [2706,774, 2765,128]


Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.j.b.Benchmark5a.run    thrpt        10       40,437        0,532   ops/ms
a.b.j.b.Benchmark5b.run    thrpt        10        0,450        0,008   ops/ms
a.b.j.b.Benchmark5c.run    thrpt        10     2735,951       29,177   ops/ms

Update: Added another Java scenario 5d using ArrayList 更新：使用ArrayList添加了另一个Java方案5d

Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.j.b.Benchmark5a.run    thrpt        10       34,931        0,504   ops/ms
a.b.j.b.Benchmark5b.run    thrpt        10        0,430        0,005   ops/ms
a.b.j.b.Benchmark5c.run    thrpt        10     2610,085        9,664   ops/ms
a.b.j.b.Benchmark5d.run    thrpt        10       56,693        1,218   ops/ms

Answer 2

You cannot draw any meaningful conclusions about performance from such a short experiment. 你不能从这么短的实验中得出任何有关性能的有意义的结论。 You might have hit a garbage collection, there might have been some other process hogging your CPU, all classes may not have been loaded and the JVM might be optimizing your code all while your test is running. 您可能遇到了垃圾收集，可能还有一些其他进程占用了CPU，所有类可能都没有加载，JVM可能会在您的测试运行时优化代码。 Any of which would adversely effect test results. 任何一种都会对测试结果产生不利影响。

Processing 1000 elements will be extremely fast. 处理1000个元素将非常快。 You should make your experiment long enough such that inaccuracy in timing measurement or external influences have a smaller effect. 您应该让实验足够长，以使定时测量或外部影响的不准确性产生较小的影响。 Try with a million elements and if that still only take a few seconds try 10 million. 尝试使用一百万个元素，如果仍然只需要几秒钟就可以尝试1000万个元素。

Consider running the test a few times when the JVM starts up before taking the measurement to "warm up" the JVM. 在JVM启动之前，请考虑运行测试几次，然后再进行测量以“预热”JVM。 You want to ensure that any classes that are lazy loaded have been loaded and that any realtime optimization performed by the JVM is complete before you start your test. 您希望确保已加载任何延迟加载的类，并且在开始测试之前JVM执行的任何实时优化都已完成。

Repeat the experiment a few more times using the longer list of elements. 使用较长的元素列表再重复几次实验。 Then throw away the fastest result and the slowest result and average the rest. 然后扔掉最快的结果和最慢的结果并平均其余的结果。

Answer 3

The given scala example might be improved: with tail recursive functions in Scala, you usually provide a compiler hint using the @tailrec annotation. 给定的scala示例可能会得到改进：使用Scala中的尾递归函数，您通常使用@tailrec批注提供编译器提示。 See here for more information. 有关更多信息，请参见此处 [Note: updated due to typos] [注：因拼写错误而更新]

Answer 4

I tried your code and the Scala version is about 5 times faster. 我尝试了你的代码，Scala版本的速度提高了大约5倍。 It takes less than 4 seconds while the Java version takes nearly 20. 它只需不到4秒，而Java版本需要将近20秒。

Java: Java的：

import java.util.List;
import java.util.LinkedList;

public class ListTest {
  public static int add(List<Integer> list, Integer sum) {
    if (list.isEmpty()) {
        return sum;
    } else {
        int headVal = list.remove(0);
        return add(list, sum + headVal);
    }
}

  public static void main(String[] args) {
    List<Integer> list = new LinkedList<>();
    int sum = 0;

    long start = System.nanoTime();
    for(int j = 0; j < 1000000; j++) {
      list.clear();
      for(int i = 1; i <= 1000; i++) list.add(i);
      sum = add(list, 0);
    }
    long end = System.nanoTime();
    System.out.println("time = " + ((end - start)/1e6) + "ms");
    System.out.println("sum = " + sum);
  }
}

Scala: 斯卡拉：

object ListTest {
  def add(list : List[Int],sum:Int): Int = {
    if (list.isEmpty) {
        sum
    } else {
        val headVal  = list.head
        add(list.tail, sum + headVal)
    }
  }

  def 

main(args: Array[String]) {
    val list = List.range(1, 1001)
    var sum = 0

    val start = System.nanoTime
    for(i <- 1 to 1000000) sum = add(list, 0);
    val end = System.nanoTime

    println("time = " + ((end - start)/1e6) + "ms")
    println("sum = " + sum)
  }
}

为什么Scala的尾递归慢于Java？

问题描述

4 个解决方案

解决方案1
6 2015-03-07 20:44:39

解决方案2
3 2015-03-07 15:45:57

解决方案3
0 2015-03-07 14:51:33

解决方案4
0 2015-03-07 17:01:40

为什么Scala的尾递归慢于Java？

问题描述

4 个解决方案

解决方案1 6 2015-03-07 20:44:39

解决方案2 3 2015-03-07 15:45:57

解决方案3 0 2015-03-07 14:51:33

解决方案4 0 2015-03-07 17:01:40

解决方案1
6 2015-03-07 20:44:39

解决方案2
3 2015-03-07 15:45:57

解决方案3
0 2015-03-07 14:51:33

解决方案4
0 2015-03-07 17:01:40