Ackermann与Haskell / GHC的效率非常低

Question

I try computing Ackermann(4,1) and there's a big difference in performance between different languages/compilers. 我尝试计算Ackermann(4,1) ，不同语言/编译器之间的性能差异很大。 Below are results on my Core i7 3820QM, 16G, Ubuntu 12.10 64bit , 以下是我的Core i7 3820QM，16G，Ubuntu 12.10 64bit的结果 ，

C: 1.6s , gcc -O3 (with gcc 4.7.2) C：1.6s ， gcc -O3 （gcc 4.7.2）

int ack(int m, int n) {
  if (m == 0) return n+1;
  if (n == 0) return ack(m-1, 1);
  return ack(m-1, ack(m, n-1));
}

int main() {
  printf("%d\n", ack(4,1));
  return 0;
}

OCaml: 3.6s , ocamlopt (with ocaml 3.12.1) OCaml：3.6s ， ocamlopt （ocaml 3.12.1）

let rec ack = function
  | 0,n -> n+1
  | m,0 -> ack (m-1, 1)
  | m,n -> ack (m-1, ack (m, n-1))
in print_int (ack (4, 1))

Standard ML: 5.1s mlton -codegen c -cc-opt -O3 (with mlton 20100608) 标准ML：5.1s mlton -codegen c -cc-opt -O3 （含mlton 20100608）

fun ack 0 n = n+1
  | ack m 0 = ack (m-1) 1
  | ack m n = ack (m-1) (ack m (n-1));
print (Int.toString (ack 4 1));

Racket: 11.5s racket (with racket v5.3.3) 球拍：11.5s racket （带球拍v5.3.3）

(require racket/unsafe/ops)

(define + unsafe-fx+)
(define - unsafe-fx-)
(define (ack m n)
  (cond
    [(zero? m) (+ n 1)]
    [(zero? n) (ack (- m 1) 1)]
    [else (ack (- m 1) (ack m (- n 1)))]))

(time (ack 4 1))

~~Haskell: unfinished , killed by system after 22s ghc -O2 (with ghc 7.4.2)~~ ~~Haskell：未完成 ，在22s ghc -O2之后被系统杀死 （ghc 7.4.2）~~

Haskell: 1.8s ajhc (with ajhc 0.8.0.4) Haskell：1.8s ajhc （ajhc 0.8.0.4）

main = print $ ack 4 1
  where ack :: Int -> Int -> Int
        ack 0 n = n+1
        ack m 0 = ack (m-1) 1
        ack m n = ack (m-1) (ack m (n-1))

The Haskell version is the only one that fails to terminate properly because it takes too much memory. Haskell版本是唯一一个无法正常终止的版本，因为它占用了太多内存。 It freezes my machine and fills the swap space before getting killed. 它会冻结我的机器并在被杀之前填充交换空间。 What can I do to improve it without heavily fuglifying the code? 如果不对代码进行冗长的处理，我该怎么做才能改进它？

EDIT : I appreciate some of the asymptotically smarter solutions, but they are not exactly what I am asking for. 编辑：我欣赏一些渐近智能的解决方案，但它们并不是我要求的。 This is more about seeing whether the compiler handles certain patterns in a reasonably efficient way (stack, tail calls, unboxing, etc.) than computing the ackermann function. 这更多的是关于查看编译器是否以合理有效的方式（堆栈，尾调用，拆箱等）处理某些模式而不是计算ackermann函数。

EDIT 2 : As pointed out by several responses, this seems to be a bug in recent versions of GHC . 编辑2 ：正如几个回复所指出的，这似乎是最近版本的GHC中的一个错误。 I try the same code with AJHC and get much better performance. 我使用AJHC尝试相同的代码并获得更好的性能。

Thank you very much :) 非常感谢你：）

Answer 1

NB: The high memory usage issue is a bug in the GHC RTS , where upon stack overflow and allocation of new stacks on the heap it was not checked whether garbage collection is due. 注意：高内存使用问题是GHC RTS中的一个错误，在堆栈溢出和堆上的新堆栈分配时，没有检查垃圾收集是否到期。 It has been already fixed in GHC HEAD. 它已在GHC HEAD中修复。

I was able to get much better performance by CPS-converting ack : 通过CPS转换ack我能够获得更好的性能：

module Main where

data P = P !Int !Int

main :: IO ()
main = print $ ack (P 4 1) id
  where
    ack :: P -> (Int -> Int) -> Int
    ack (P 0 n) k = k (n + 1)
    ack (P m 0) k = ack (P (m-1) 1) k
    ack (P m n) k = ack (P m (n-1)) (\a -> ack (P (m-1) a) k)

Your original function consumes all available memory on my machine, while this one runs in constant space. 您的原始功能会消耗我机器上的所有可用内存，而这个内存会在恒定的空间内运行。

$ time ./Test
65533
./Test  52,47s user 0,50s system 96% cpu 54,797 total

Ocaml is still faster, however: 然而，Ocaml仍然更快：

$ time ./test
65533./test  7,97s user 0,05s system 94% cpu 8,475 total

Edit: When compiled with JHC , your original program is about as fast as the Ocaml version: 编辑：使用JHC编译时，您的原始程序与Ocaml版本一样快：

$ time ./hs.out 
65533
./hs.out  5,31s user 0,03s system 96% cpu 5,515 total

Edit 2: Something else I've discovered: running your original program with a larger stack chunk size ( +RTS -kc1M ) makes it run in constant space. 编辑2：我发现的其他东西：使用更大的堆栈块大小（ +RTS -kc1M ）运行原始程序使其在恒定空间中运行。 The CPS version is still a bit faster, though. 不过，CPS版本仍然有点快。

Edit 3: I managed to produce a version that runs nearly as fast as the Ocaml one by manually unrolling the main loop. 编辑3：我设法通过手动展开主循环来生成一个运行速度几乎与Ocaml一样快的版本。 However, it only works when run with +RTS -kc1M (Dan Doel has filed a bug about this behaviour): 但是，它仅在使用+RTS -kc1M运行时才有效（Dan Doel 已提交有关此行为的错误）：

{-# LANGUAGE CPP #-}
module Main where

data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Int

ack0 :: Int -> Int
ack0 n =(n+1)

#define C(a) a
#define CONCAT(a,b) C(a)C(b)

#define AckType(M) CONCAT(ack,M) :: Int -> Int

AckType(1)
AckType(2)
AckType(3)
AckType(4)

#define AckDecl(M,M1) \
CONCAT(ack,M) n = case n of { 0 -> CONCAT(ack,M1) 1 \
; 1 ->  CONCAT(ack,M1) (CONCAT(ack,M1) 1) \
; _ ->  CONCAT(ack,M1) (CONCAT(ack,M) (n-1)) }

AckDecl(1,0)
AckDecl(2,1)
AckDecl(3,2)
AckDecl(4,3)

ack :: P -> (Int -> Int) -> Int
ack (P m n) k = case m of
  0 -> k (ack0 n)
  1 -> k (ack1 n)
  2 -> k (ack2 n)
  3 -> k (ack3 n)
  4 -> k (ack4 n)
  _ -> case n of
    0 -> ack (P (m-1) 1) k
    1 -> ack (P (m-1) 1) (\a -> ack (P (m-1) a) k)
    _ -> ack (P m (n-1)) (\a -> ack (P (m-1) a) k)

main :: IO ()
main = print $ ack (P 4 1) id

Testing: 测试：

$ time ./Test +RTS -kc1M
65533
./Test +RTS -kc1M  6,30s user 0,04s system 97% cpu 6,516 total

Edit 4 : Apparently, the space leak is fixed in GHC HEAD , so +RTS -kc1M won't be required in the future. 编辑4 ：显然，空间泄漏在GHC HEAD中是固定的，因此将来不需要+RTS -kc1M 。

Answer 2

It seems that there is some kind of bug involved. 似乎涉及某种bug。 What GHC version are you using? 您使用的GHC版本是什么？

With GHC 7, I get the same behavior as you do. 使用GHC 7，我会得到与您相同的行为。 The program consumes all available memory without producing any output. 该程序消耗所有可用内存而不产生任何输出。

However if I compile it with GHC 6.12.1 just with ghc --make -O2 Ack.hs , it works perfectly. 但是，如果我使用GHC 6.12.1使用ghc --make -O2 Ack.hs编译它，它可以很好地工作。 It computes the result in 10.8s on my computer, while plain C version takes 7.8s . 它在我的计算机上计算10.8秒的结果，而普通的C版本需要7.8秒 。

I suggest you to report this bug on GHC web site . 我建议你在GHC网站上报告这个bug 。

Answer 3

This version uses some properties of the ackermann function. 该版本使用了ackermann函数的一些属性。 It's not equivalent to the other versions, but it's fast : 它不等同于其他版本，但速度很快：

ackermann :: Int -> Int -> Int
ackermann 0 n = n + 1
ackermann m 0 = ackermann (m - 1) 1
ackermann 1 n = n + 2
ackermann 2 n = 2 * n + 3
ackermann 3 n = 2 ^ (n + 3) - 3
ackermann m n = ackermann (m - 1) (ackermann m (n - 1))

Edit : And this is a version with memoization , we see that it's easy to memoize a function in haskell, the only change is in the call site : 编辑：这是一个带有memoization的版本，我们看到很容易在haskell中记忆一个函数，唯一的变化是在调用站点：

import Data.Function.Memoize

ackermann :: Integer -> Integer -> Integer
ackermann 0 n = n + 1
ackermann m 0 = ackermann (m - 1) 1
ackermann 1 n = n + 2
ackermann 2 n = 2 * n + 3
ackermann 3 n = 2 ^ (n + 3) - 3
ackermann m n = ackermann (m - 1) (ackermann m (n - 1))

main :: IO ()
main = print $ memoize2 ackermann 4 2

Answer 4

The following is an idiomatic version that takes advantage of Haskell's lazyness and GHC's optimisation of constant top-level expressions. 以下是一个惯用的版本，它利用了Haskell的惰性和GHC对常量顶级表达式的优化。

acks :: [[Int]]
acks = [ [ case (m, n) of
                (0, _) -> n + 1
                (_, 0) -> acks !! (m - 1) !! 1
                (_, _) -> acks !! (m - 1) !! (acks !! m !! (n - 1))
         | n <- [0..] ]
       | m <- [0..] ]

main :: IO ()
main = print $ acks !! 4 !! 1

Here, we're lazily building a matrix for all the values of the Ackermann function. 在这里，我们懒洋洋地为Ackermann函数的所有值构建矩阵。 As a result, subsequent calls to acks will not recompute anything (ie evaluating acks !! 4 !! 1 again will not double the running time). 因此，随后对acks调用将不会重新计算任何内容（即评估acks !! 4 !! 1再次不会使运行时间加倍）。

Although this is not the fastest solution, it looks a lot like the naïve implementation, it is very efficient in terms of memory use, and it recasts one of Haskell's weirder features (lazyness) as a strength. 虽然这不是最快的解决方案，但它看起来很像天真的实现，它在内存使用方面非常高效，并且它重写了Haskell的一个怪异特征（懒惰）作为一种强度。

Answer 5

I don't see that this is a bug at all, ghc just isn't taking advantage of the fact that it knows that 4 and 1 are the only arguments the function will ever be called with -- that is, to put it bluntly, it doesn't cheat. 我根本没有看到这是一个bug， ghc只是没有利用它知道4和1是该函数将被调用的唯一参数的事实 - 也就是说，直言不讳，它不作弊。 It also doesn't do constant math for you, so if you had written main = print $ ack (2+2) 1 , it wouldn't have calculated that 2+2 = 4 till runtime. 它也不会为你做恒定的数学运算，所以如果你写了main = print $ ack (2+2) 1 ，它就不会计算出2 + 2 = 4直到运行时。 The ghc has much more important things to think about. ghc有更重要的事情要考虑。 Help for the latter difficulty is available if you care for it http://hackage.haskell.org/package/const-math-ghc-plugin . 如果您关心它，可以获得后一种困难的帮助http://hackage.haskell.org/package/const-math-ghc-plugin 。

So ghc is helped if you do a little math eg this is at least a hundered times as fast as your C program with 4 and 1 as arguments. 因此，如果你做一些数学运算， ghc会有所帮助，例如，这至少比你的C程序快4倍，而4和1作为参数。 But try it with 4 & 2: 但尝试4和2：

main = print $ ack 4 2 where

    ack :: Int -> Integer -> Integer
    ack 0 n = n + 1
    ack 1 n = n + 2 
    ack 2 n = 2 * n + 3
    ack m 0 = ack (m-1) 1
    ack m n = ack (m-1) (ack m (n-1) )

This will give the right answer, all ~20,000 digits, in under a tenth of a second, whereas the gcc, with your algorithm, will take forever to give the wrong answer. 这将给出正确的答案，所有~2,000位数字，在十分之一秒内，而gcc，与你的算法，将永远给出错误的答案。

Answer 6

Writing the algorithm in Haskell in a way that looks similar to the way you wrote it in C is not the same algorithm, because the semantics of recursion are quite different. 以类似于在C中编写它的方式在Haskell中编写算法的算法不同，因为递归的语义是完全不同的。

Here is a version using the same mathematical algorithm, but where we represent calls to the Ackermann function symbolically using a data type. 这是一个使用相同数学算法的版本，但我们用符号方式使用数据类型表示对Ackermann函数的调用。 That way, we can control the semantics of the recursion more precisely. 这样，我们可以更精确地控制递归的语义。

When compiled with optimization, this version runs in constant memory, but it is slow - about 4.5 minutes in an environment similar to yours. 在使用优化进行编译时，此版本在常量内存中运行，但速度很慢 - 在类似于您的环境中大约需要4.5分钟。 But I'm sure it could be modified to be much faster. 但我相信它可以被修改为更快。 This is just to give the idea. 这只是为了提出这个想法。

data Ack = Ack !Int

ack :: Int -> Int -> Int
ack m n = length . ackR $ Ack m : replicate n (Ack 0)
  where
    ackR n@(Ack 0 : _) = n
    ackR n             = ackR $ ack' n

    ack' [] = []
    ack' (Ack 0 : n) = Ack 0 : ack' n
    ack' [Ack m]     = [Ack (m-1), Ack 0]
    ack' (Ack m : n) = Ack (m-1) : ack' (Ack m : decr n)

    decr (Ack 0 : n) = n
    decr n           = decr $ ack' n

Answer 7

This performance issue (except for GHC RTS bug obviously) seems to be fixed now on OS X 10.8 after Apple XCode update to 4.6.2 . 在Apple XCode更新到4.6.2之后，这个性能问题（显然除了GHC RTS bug）似乎在OS X 10.8上得到修复。 I can still reproduce it on Linux (I have been testing with GHC LLVM backend though), but not any more on OS X. After I updated the XCode to 4.6.2, the new version seems to have affected the GHC backend code generation for Ackermann substantially (from what I remember from looking at object dumps pre-update). 我仍然可以在Linux上重现它（我已经使用GHC LLVM后端进行了测试），但是在OS X上已经不再重复了。在我将XCode更新到4.6.2之后，新版本似乎影响了GHC后端代码生成阿克曼基本上（从我记得从更新前的对象转储）。 I was able to reproduce the performance issue on Mac before XCode update - I don't have the numbers but they were surely quite bad. 我能够在XCode更新之前重现Mac上的性能问题 - 我没有这些数字，但它们肯定非常糟糕。 So, it seems that XCode update improved the GHC code generation for Ackermann. 因此，似乎XCode更新改进了Ackermann的GHC代码生成。

Now, both C and GHC versions are quite close. 现在，C和GHC版本都非常接近。 C code: C代码：

int ack(int m,int n){

  if(m==0) return n+1;
  if(n==0) return ack(m-1,1);
  return ack(m-1, ack(m,n-1));

}

Time to execute ack(4,1): 执行ack的时间（4,1）：

GCC 4.8.0: 2.94s
Clang 4.1: 4s

Haskell code: Haskell代码：

ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

Time to execute ack 4 1 (with +RTS -kc1M): 执行ack 4 1的时间（使用+ RTS -kc1M）：

GHC 7.6.1 Native: 3.191s
GHC 7.6.1 LLVM: 3.8s

All were compiled with -O2 flag (and -rtsopts flag for GHC for RTS bug workaround). 所有都使用-O2标志进行编译（对于RTS bug解决方法，GHC的-rtsopts标志）。 It is quite a head scratcher though. 尽管如此，这是一个令人头疼的问题。 Updating XCode seems to have made a big difference with optimization of Ackermann in GHC. 更新XCode似乎与GHC中Ackermann的优化有很大的不同。

Ackermann与Haskell / GHC的效率非常低

问题描述

7 个解决方案

解决方案1
36 已采纳 2013-04-20 03:00:28

解决方案2
13 2013-04-20 16:36:10

解决方案3
7 2013-04-20 10:01:11

解决方案4
5 2013-04-20 03:10:43

解决方案5
4 2013-04-20 21:04:01

解决方案6
4 2013-04-23 18:09:41

解决方案7
3 2013-04-27 03:05:32

Ackermann与Haskell / GHC的效率非常低

问题描述

7 个解决方案

解决方案1 36 已采纳 2013-04-20 03:00:28

解决方案2 13 2013-04-20 16:36:10

解决方案3 7 2013-04-20 10:01:11

解决方案4 5 2013-04-20 03:10:43

解决方案5 4 2013-04-20 21:04:01

解决方案6 4 2013-04-23 18:09:41

解决方案7 3 2013-04-27 03:05:32

解决方案1
36 已采纳 2013-04-20 03:00:28

解决方案2
13 2013-04-20 16:36:10

解决方案3
7 2013-04-20 10:01:11

解决方案4
5 2013-04-20 03:10:43

解决方案5
4 2013-04-20 21:04:01

解决方案6
4 2013-04-23 18:09:41

解决方案7
3 2013-04-27 03:05:32