简体   繁体   English

C vs Haskell中的朴素斐波那契

[英]Naive Fibonacci in C vs Haskell

Please, how to make the evaluation of g (fib) completely strict? 请问,如何对g (fib)的评价完全严格? ( I know that this exponential solution is not optimal. I would like to know how to make that recursion completely strict /if possible/) 我知道这个指数解决方案不是最优的。我想知道如何使递归完全严格 /如果可能/)

Haskell 哈斯克尔

g :: Int -> Int
g 0 = 0
g 1 = 1
g x = g(x-1) + g(x-2)
main = print $ g 42

So that it runs approximately as fast as the naive C solution: 因此它的运行速度与天真的C解决方案一样快:

C C

#include <stdio.h>

long f(int x)
{
    if (x == 0) return 0;
    if (x == 1) return 1;
    return f(x-1) + f(x-2);
}

int main(void)
{
    printf("%ld\n", f(42));
    return 0;
}

Note : This fibs-recursion is used only as a supersimple example. 注意 :此fibs-recursion仅用作超简单示例。 I totally know, that there are dozens of better algorithms. 我完全知道,有很多更好的算法。 But there definitely are recursive algorithms which DON'T HAVE so simple and more effective alternatives. 但肯定有递归算法,它们没有那么简单和有效的替代方案。

The answer is, GHC makes the evaluation completely strict on its own (when you give it the chance by compiling with optimisations). 答案是,GHC自己完全严格评估(当你通过优化编译给它机会时)。 The original code produces the core 原始代码生成核心

Rec {
Main.$wg [Occ=LoopBreaker] :: GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType L]
Main.$wg =
  \ (ww_s1JE :: GHC.Prim.Int#) ->
    case ww_s1JE of ds_XsI {
      __DEFAULT ->
        case Main.$wg (GHC.Prim.-# ds_XsI 1) of ww1_s1JI { __DEFAULT ->
        case Main.$wg (GHC.Prim.-# ds_XsI 2) of ww2_X1K4 { __DEFAULT ->
        GHC.Prim.+# ww1_s1JI ww2_X1K4
        }
        };
      0 -> 0;
      1 -> 1
    }
end Rec }

which, as you can see if you know GHC's core, is completely strict and uses unboxed raw machine integers. 正如您所知,如果您了解GHC的核心,那么它是完全严格的并且使用未装箱的原始机器整数。

(Unfortunately, the machine code gcc produces from the C source is just plain faster.) (不幸的是,gcc生成的机器代码gcc速度更快。)

GHC's strictness analyser is rather good, and in simple cases like here, where there's no polymorphism involved and the function is not too complicated, you can count on it finding that it can unbox all values to produce a worker using unboxed Int# s. GHC的严格性分析器相当不错,在这种简单的情况下,没有涉及多态性且函数不太复杂,你可以依靠它发现它可以解除所有值以使用未装箱的Int# s生成一个worker。

However, in cases like this, there's more to producing fast code than just operating on machine types. 但是,在这种情况下,生成快速代码不仅仅是在机器类型上运行。 The assembly produced by the native code generator, as well as by the LLVM backend is basically a direct translation of the code to assembly, check whether the argument is 0 or 1, and if not call the function twice and add the results. 由本机代码生成器以及LLVM后端生成的程序集基本上是代码到程序集的直接转换,检查参数是0还是1,如果没有调用该函数两次并添加结果。 Both produce some entry and exit code I don't understand, and on my box, the native code generator produces the faster code. 两者都产生一些我不理解的入口和出口代码,在我的盒子上,本机代码生成器产生更快的代码。

For the C code, clang -O3 produces the straightforward assembly with less cruft and using more registers, 对于C代码, clang -O3产生简单的装配,减少了使用和使用更多的寄存器,

.Ltmp8:
    .cfi_offset %r14, -24
    movl        %edi, %ebx
    xorl        %eax, %eax
    testl       %ebx, %ebx
    je          .LBB0_4
# BB#1:
    cmpl        $1, %ebx
    jne         .LBB0_3
# BB#2:
    movl        $1, %eax
    jmp         .LBB0_4
.LBB0_3:
    leal        -1(%rbx), %edi
    callq       recfib
    movq        %rax, %r14
    addl        $-2, %ebx
    movl        %ebx, %edi
    callq       recfib
    addq        %r14, %rax
.LBB0_4:
    popq        %rbx
    popq        %r14
    popq        %rbp
    ret

(which for some reason performs much better on my system today than it did yesterday). (由于某些原因,今天我的系统在昨天的表现要比昨天好得多)。 A lot of the difference in performance between the code produced from the Haskell source and the C comes from the use of registers in the latter case where indirect addressing is used in the former, the core of the algorithm is the same in both. 从Haskell源和C生成的代码之间的性能差异来自后者使用寄存器,其中前者使用间接寻址,算法的核心在两者中都是相同的。

gcc, without any optimisations produces essentially the same using some indirect addressing, but less than what GHC produced with either the NCG or the LLVM backend. gcc,没有任何优化,使用一些间接寻址产生基本相同,但少于GHC用NCG或LLVM后端产生的。 With -O1 , ditto, but with even less indirect addressing. 使用-O1 ,同上,但间接寻址更少。 With -O2 , you get a transformation so that the assembly doesn't easily map back to the source, and with -O3 , gcc produces the fairly amazing 使用-O2 ,你会得到一个转换,这样装配就不会轻易地映射回源,而使用-O3 ,gcc会产生相当惊人的效果

.LFB0:
    .cfi_startproc
    pushq   %r15
    .cfi_def_cfa_offset 16
    .cfi_offset 15, -16
    pushq   %r14
    .cfi_def_cfa_offset 24
    .cfi_offset 14, -24
    pushq   %r13
    .cfi_def_cfa_offset 32
    .cfi_offset 13, -32
    pushq   %r12
    .cfi_def_cfa_offset 40
    .cfi_offset 12, -40
    pushq   %rbp
    .cfi_def_cfa_offset 48
    .cfi_offset 6, -48
    pushq   %rbx
    .cfi_def_cfa_offset 56
    .cfi_offset 3, -56
    subq    $120, %rsp
    .cfi_def_cfa_offset 176
    testl   %edi, %edi
    movl    %edi, 64(%rsp)
    movq    $0, 16(%rsp)
    je      .L2
    cmpl    $1, %edi
    movq    $1, 16(%rsp)
    je      .L2
    movl    %edi, %eax
    movq    $0, 16(%rsp)
    subl    $1, %eax
    movl    %eax, 108(%rsp)
.L3:
    movl    108(%rsp), %eax
    movq    $0, 32(%rsp)
    testl   %eax, %eax
    movl    %eax, 72(%rsp)
    je      .L4
    cmpl    $1, %eax
    movq    $1, 32(%rsp)
    je      .L4
    movl    64(%rsp), %eax
    movq    $0, 32(%rsp)
    subl    $2, %eax
    movl    %eax, 104(%rsp)
.L5:
    movl    104(%rsp), %eax
    movq    $0, 24(%rsp)
    testl   %eax, %eax
    movl    %eax, 76(%rsp)
    je      .L6
    cmpl    $1, %eax
    movq    $1, 24(%rsp)
    je      .L6
    movl    72(%rsp), %eax
    movq    $0, 24(%rsp)
    subl    $2, %eax
    movl    %eax, 92(%rsp)
.L7:
    movl    92(%rsp), %eax
    movq    $0, 40(%rsp)
    testl   %eax, %eax
    movl    %eax, 84(%rsp)
    je      .L8
    cmpl    $1, %eax
    movq    $1, 40(%rsp)
    je      .L8
    movl    76(%rsp), %eax
    movq    $0, 40(%rsp)
    subl    $2, %eax
    movl    %eax, 68(%rsp)
.L9:
    movl    68(%rsp), %eax
    movq    $0, 48(%rsp)
    testl   %eax, %eax
    movl    %eax, 88(%rsp)
    je      .L10
    cmpl    $1, %eax
    movq    $1, 48(%rsp)
    je      .L10
    movl    84(%rsp), %eax
    movq    $0, 48(%rsp)
    subl    $2, %eax
    movl    %eax, 100(%rsp)
.L11:
    movl    100(%rsp), %eax
    movq    $0, 56(%rsp)
    testl   %eax, %eax
    movl    %eax, 96(%rsp)
    je      .L12
    cmpl    $1, %eax
    movq    $1, 56(%rsp)
    je      .L12
    movl    88(%rsp), %eax
    movq    $0, 56(%rsp)
    subl    $2, %eax
    movl    %eax, 80(%rsp)
.L13:
    movl    80(%rsp), %eax
    movq    $0, 8(%rsp)
    testl   %eax, %eax
    movl    %eax, 4(%rsp)
    je      .L14
    cmpl    $1, %eax
    movq    $1, 8(%rsp)
    je      .L14
    movl    96(%rsp), %r15d
    movq    $0, 8(%rsp)
    subl    $2, %r15d
.L15:
    xorl    %r14d, %r14d
    testl   %r15d, %r15d
    movl    %r15d, %r13d
    je      .L16
    cmpl    $1, %r15d
    movb    $1, %r14b
    je      .L16
    movl    4(%rsp), %r12d
    xorb    %r14b, %r14b
    subl    $2, %r12d
    .p2align 4,,10
    .p2align 3
.L17:
    xorl    %ebp, %ebp
    testl   %r12d, %r12d
    movl    %r12d, %ebx
    je      .L18
    cmpl    $1, %r12d
    movb    $1, %bpl
    je      .L18
    xorb    %bpl, %bpl
    jmp     .L20
    .p2align 4,,10
    .p2align 3
.L21:
    cmpl    $1, %ebx
    je      .L58
.L20:
    leal    -1(%rbx), %edi
    call    recfib
    addq    %rax, %rbp
    subl    $2, %ebx
    jne     .L21
.L18:
    addq    %rbp, %r14
    subl    $2, %r13d
    je      .L16
    subl    $2, %r12d
    cmpl    $1, %r13d
    jne     .L17
    addq    $1, %r14
.L16:
    addq    %r14, 8(%rsp)
    subl    $2, 4(%rsp)
    je      .L14
    subl    $2, %r15d
    cmpl    $1, 4(%rsp)
    jne     .L15
    addq    $1, 8(%rsp)
.L14:
    movq    8(%rsp), %rax
    addq    %rax, 56(%rsp)
    subl    $2, 96(%rsp)
    je      .L12
    subl    $2, 80(%rsp)
    cmpl    $1, 96(%rsp)
    jne     .L13
    addq    $1, 56(%rsp)
.L12:
    movq    56(%rsp), %rax
    addq    %rax, 48(%rsp)
    subl    $2, 88(%rsp)
    je      .L10
    subl    $2, 100(%rsp)
    cmpl    $1, 88(%rsp)
    jne     .L11
    addq    $1, 48(%rsp)
.L10:
    movq    48(%rsp), %rax
    addq    %rax, 40(%rsp)
    subl    $2, 84(%rsp)
    je      .L8
    subl    $2, 68(%rsp)
    cmpl    $1, 84(%rsp)
    jne     .L9
    addq    $1, 40(%rsp)
.L8:
    movq    40(%rsp), %rax
    addq    %rax, 24(%rsp)
    subl    $2, 76(%rsp)
    je      .L6
    subl    $2, 92(%rsp)
    cmpl    $1, 76(%rsp)
    jne     .L7
    addq    $1, 24(%rsp)
.L6:
    movq    24(%rsp), %rax
    addq    %rax, 32(%rsp)
    subl    $2, 72(%rsp)
    je      .L4
    subl    $2, 104(%rsp)
    cmpl    $1, 72(%rsp)
    jne     .L5
    addq    $1, 32(%rsp)
.L4:
    movq    32(%rsp), %rax
    addq    %rax, 16(%rsp)
    subl    $2, 64(%rsp)
    je      .L2
    subl    $2, 108(%rsp)
    cmpl    $1, 64(%rsp)
    jne     .L3
    addq    $1, 16(%rsp)
.L2:
    movq    16(%rsp), %rax
    addq    $120, %rsp
    .cfi_remember_state
    .cfi_def_cfa_offset 56
    popq    %rbx
    .cfi_def_cfa_offset 48
    popq    %rbp
    .cfi_def_cfa_offset 40
    popq    %r12
    .cfi_def_cfa_offset 32
    popq    %r13
    .cfi_def_cfa_offset 24
    popq    %r14
    .cfi_def_cfa_offset 16
    popq    %r15
    .cfi_def_cfa_offset 8
    ret
    .p2align 4,,10
    .p2align 3
.L58:
    .cfi_restore_state
    addq    $1, %rbp
    jmp     .L18
    .cfi_endproc

which is much faster than anything else tested. 这比其他任何测试都要快得多。 gcc unrolled the algorithm to a remarkable depth, which neither GHC nor LLVM did, and that makes a huge difference here. gcc将算法展开到一个非常深的地方,GHC和LLVM都没有这样做,这在这里产生了巨大的变化。

Start by using a better algorithm! 首先使用更好的算法!

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

fib n = fibs !! n-1

fib 42 will give you an answer much faster. fib 42会更快地给你答案。

It's much more important to use a better algorithm than make minor speed tweaks. 重要的是使用比使小的速度调整更好的算法。

You can happily and quickly calculate fib 123456 in ghci (ie interpreted, not even compiled) with this definition (it's 25801 digits long). 你可以用这个定义快乐地快速计算ghci中的fib 123456 (即解释,甚至不编译)(长度为25801位)。 You might get your C code to calculate that faster, but you'll take quite a while writing it. 你可能会得到你的C代码来更快地计算,但你需要花一些时间来编写它。 This took me hardly any time at all. 这几乎没有任何时间。 I spent much more time writing this post! 我花了很多时间写这篇文章!

Morals: 德:

  1. Use the right algorithm! 使用正确的算法!
  2. Haskell lets you write clean versions of code, memoising answers simply. Haskell允许您编写干净的代码版本,简单地回忆起答案。
  3. It's sometimes easier to define an infinite list of answers and grab the one you want than to write some looping version that updates values. 有时更容易定义无限的答案列表并获取所需的答案,而不是编写一些更新值的循环版本。
  4. Haskell is awesome. Haskell真棒。

This is completely strict. 这是完全严格的。

g :: Int -> Int
g 0 = 0
g 1 = 1
g x = a `seq` b `seq` a + b where
   a = g $! x-1
   b = g $! x-2
main = print $! g 42

$! is the same as $ (low precedence function application) except that it is strict in the function argument. $ (低优先级函数应用程序)相同,只是它在函数参数中是严格的。

You will want to compile with -O2 as well, although I am curious as to why you don't want to use a better algorithm. 你也想用-O2编译,虽然我很好奇为什么你不想使用更好的算法。

The function is already completely strict. 该功能已经完全严格。

The usual definition of a function being strict is that if you give it undefined input, it will itself be undefined. 一个严格的函数的通常定义是,如果你给它未定义的输入,它本身将是未定义的。 I assume from context that you are thinking of a different notion of strictness, namely a function is strict if it evaluates its arguments before producing a result. 我假设你从上下文中想到了一个不同的严格概念,即如果函数在生成结果之前计算其参数,则它是严格的。 But usually the only way to check if a value is undefined is to evaluate it, so the two are often equivalent. 但通常检查值是否未定义的唯一方法是评估它,因此两者通常是等价的。

According to the first definition, g is certainly strict, since it must check if the argument is equal to zero before knowing which branch of the definition to use, so if the argument is undefined, g itself will choke when it tries to read it. 根据第一个定义, g肯定是严格的,因为它必须在知道要使用的定义的哪个分支之前检查参数是否等于零,因此如果参数未定义,则g本身在尝试读取它时会窒息。

According to a more informal definition, well, what could g do wrong? 根据非正式的定义,那么,有什么能g做错了什么? The first two clauses are obviously fine, and mean that by the time we get to the third clause, we must already have evaluated n . 前两个条款显然很好,并且意味着当我们达到第三个条款时,我们必须已经评估了n Now, in the third clause, we have an addition of two function calls. 现在,在第三个子句中,我们增加了两个函数调用。 More completely, we have the following tasks: 更完整的是,我们有以下任务:

  1. subtract 1 from n n减去1
  2. subtract 2 from n n减去2
  3. call g with the result of 1. 用结果1调用g
  4. call g with the result of 2. 用结果2调用g
  5. add the results of 3. and 4. together. 一起添加3.和4.的结果。

Laziness can mess with the orders of these operations a little, but since both + and g need the values of their arguments before they can run their code, really nothing can be delayed by any significant amount, and certainly the compiler is free to run these operations in strict order if it can only show that + is strict (it's built-in, so that shouldn't be too hard) and g is strict (but it obviously is). 懒惰可以稍微搞乱这些操作的顺序,但由于+g需要它们的参数值才能运行它们的代码,所以实际上没有任何东西可以延迟任何大量的数据,当然编译器可以自由地运行这些严格按顺序操作,如果它只能表明+是严格的(它是内置的,所以不应该太难)并且g是严格的(但显然是)。 So any reasonable optimising compiler will not have too much trouble with this, and furthermore any non-optimising compiler will not incur any significant overhead (it's certainly not like the situation of foldl (+) 0 [1 .. 1000000] ) by doing the completely naive thing. 因此任何合理的优化编译器都不会有太多的麻烦,而且任何非优化编译器都不会产生任何显着的开销(它肯定不像foldl (+) 0 [1 .. 1000000] )完全天真的事。

The lesson is that when a function immediately compares its argument against something, that function is already strict, and any decent compiler will be able to exploit that fact to eliminate the usual overheads of laziness. 经验教训是,当函数立即将其参数与某些东西进行比较时,该函数已经是严格的,并且任何体面的编译器都能够利用该事实来消除懒惰的通常开销。 That does not mean it will be able to eliminate other overheads, like the time taken to start the runtime system, that tend to make Haskell programs a little slower than C programs. 这并不意味着它将能够消除其他开销,例如启动运行时系统所花费的时间,这往往会使Haskell程序比C程序慢一点。 If you're just looking at performance numbers, there's a lot more going on there than whether your program is strict or lazy. 如果您只是在查看性能数字,那么除了您的程序是严格还是懒惰之外,还有更多内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM