简体   繁体   English

关于Haskell性能的推理

[英]Reasoning about performance in Haskell

The following two Haskell programs for computing the n'th term of the Fibonacci sequence have greatly different performance characteristics: 以下两个用于计算Fibonacci序列第n项的Haskell程序具有非常不同的性能特征:

fib1 n =
  case n of
    0 -> 1
    1 -> 1
    x -> (fib1 (x-1)) + (fib1 (x-2))

fib2 n = fibArr !! n where
  fibArr = 1:1:[a + b | (a, b) <- zip fibArr (tail fibArr)]

They are very close to mathematically identical, but fib2 uses the list notation to memoize its intermediate results, while fib1 has explicit recursion. 它们在数学上非常接近,但是fib2使用列表符号来记忆其中间结果,而fib1具有显式递归。 Despite the potential for the intermediate results to be cached in fib1 , the execution time gets to be a problem even for fib1 25 , suggesting that the recursive steps are always evaluated. 尽管中间结果可能会缓存在fib1 ,但即使对于fib1 25 ,执行时间也会成为问题,这表明总是会评估递归步骤。 Does referential transparency contribute anything to Haskell's performance? 参考透明度是否对Haskell的性能有贡献? How can I know ahead of time if it will or won't? 我怎么能提前知道它是否会?

This is just an example of the sort of thing I'm worried about. 这只是我担心的一个例子。 I'd like to hear any thoughts about overcoming the difficulty inherent in reasoning about the performance of a lazily-executed, functional programming language. 我想听听有关克服关于延迟执行的函数式编程语言的性能推理所固有的困难的任何想法。


Summary: I'm accepting 3lectrologos's answer, because the point that you don't reason so much about the language's performance, as about your compiler's optimization, seems to be extremely important in Haskell - more so than in any other language I'm familiar with. 简介:我接受3lectrologos的答案,因为关于语言的性能,关于编译器的优化,你没有那么多理由这一点似乎在Haskell中非常重要 - 比我熟悉的任何其他语言更重要用。 I'm inclined to say that the importance of the compiler is the factor that differentiates reasoning about performance in lazy, functional langauges, from reasoning about the performance of any other type. 我倾向于说编译器的重要性是区分懒惰,功能语言中的性能推理的因素,而不是任何其他类型的性能推理。


Addendum: Anyone happening on this question may want to look at the slides from Johan Tibell 's talk about high performance Haskell . 附录:在这个问题上发生的任何人都可能想看看Johan Tibell 谈论高性能Haskell 的幻灯片

In your particular Fibonacci example, it's not very hard to see why the second one should run faster (although you haven't specified what f2 is). 在你特定的Fibonacci例子中,不难看出为什么第二个应该运行得更快(尽管你没有指定f2是什么)。

It's mainly an algorithmic issue: 它主要是一个算法问题:

  • fib1 implements the purely recursive algorithm and (as far as I know) Haskell has no mechanism for "implicit memoization". fib1实现了纯粹的递归算法(据我所知)Haskell没有“隐式记忆”的机制。
  • fib2 uses explicit memoization (using the fibArr list to store previously computed values. fib2使用显式memoization (使用fibArr列表存储以前计算的值。

In general, it's much harder to make performance assumptions for a lazy language like Haskell, than for an eager one. 一般来说,对于像Haskell这样的惰性语言而言,要比那些热切的语言做出性能假设要困难得多。 Nevertheless, if you understand the underlying mechanisms (especially for laziness) and gather some experience , you will be able to make some "predictions" about performance. 然而,如果您了解潜在的机制 (特别是懒惰)并收集一些经验 ,您将能够对性能做出一些“预测”。

Referential transparency increases (potentially) performance in (at least) two ways: 参考透明度 (至少)以两种方式增加(潜在)性能:

  • First, you (as a programmer) can be sure that two calls to the same function will always return the same, so you can exploit this in various cases to benefit in performance. 首先,您(作为程序员)可以确保对同一函数的两次调用始终返回相同,因此您可以在各种情况下利用它来获得性能。
  • Second (and more important), the Haskell compiler can be sure for the above fact and this may enable many optimizations that can't be enabled in impure languages (if you've ever written a compiler or have any experience in compiler optimizations you are probably aware of the importance of this). 第二个(也是更重要的),Haskell编译器可以确定上述事实,这可能会启用许多无法在不纯的语言中启用的优化(如果您曾编写过编译器或者在编译器优化方面有任何经验,那么可能意识到这个的重要性)。

If you want to read more about the reasoning behind the design choices (laziness, pureness) of Haskell, I'd suggest reading this . 如果你想更多地了解Haskell的设计选择背后的原因(懒惰,纯粹),我建议你阅读这篇文章

Reasoning about performance is generally hard in Haskell and lazy languages in general, although not impossible. 在Haskell和懒惰语言中,关于性能的推理通常很难,尽管并非不可能。 Some techniques are covered in Chris Okasaki's Purely Function Data Structures (also available online in a previous version). Chris Okasaki的Purely Function Data Structures (也可在以前的版本中在线获得)中介绍了一些技术。

Another way to ensure performance is to fix the evaluation order, either using annotations or continuation passing style . 确保性能的另一种方法是使用注释或继续传递样式来修复评估顺序。 That way you get to control when things are evaluated. 这样你就可以控制什么时候进行评估。

In your example you might calculate the numbers "bottom up" and pass the previous two numbers along to each iteration: 在您的示例中,您可以计算“自下而上”的数字,并将前两个数字传递给每次迭代:

fib n = fib_iter(1,1,n)
    where
      fib_iter(a,b,0) = a
      fib_iter(a,b,1) = a
      fib_iter(a,b,n) = fib_iter(a+b,a,n-1)

This results in a linear time algorithm. 这导致线性时间算法。

Whenever you have a dynamic programming algorithm where each result relies on the N previous results, you can use this technique. 只要您拥有动态编程算法,其中每个结果都依赖于之前的N个结果,您就可以使用此技术。 Otherwise you might have to use an array or something completely different. 否则,您可能必须使用数组或完全不同的内容。

Your implementation of fib2 uses memoization but each time you call fib2 it rebuild the "whole" result. 你的fib2实现使用memoization,但每次调用fib2时它都会重建“整体”结果。 Turn on ghci time and size profiling: 打开ghci时间和大小分析:

Prelude> :set +s

If it was doing memoisation "between" calls the subsequent calls would be faster and use no memory. 如果它在“之间”调用进行记忆,则后续调用将更快并且不使用内存。 Call fib2 20000 twice and see for yourself. 两次调用fib2 20000并亲自看看。

By comparison a more idiomatic version where you define the exact mathematical identity: 通过比较一个更惯用的版本,您可以在其中定义确切的数学标识:

-- the infinite list of all fibs numbers.
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

memoFib n = fibs !! n

actually do use memoisation, explicit as you see. 实际上确实使用了memoisation,如你所见,显式。 If you run memoFib 20000 twice you'll see the time and space taken the first time then the second call is instantaneous and take no memory. 如果你运行memoFib 20000两次,你将看到第一次采取的时间和空间,然后第二次调用是瞬时的,不占用内存。 No magic and no implicit memoization like a comment might have hinted at. 没有任何魔法,也没有像评论那样隐含的记忆可能暗示过。

Now about your original question: optimizing and reasoning about performance in Haskell... 现在谈谈你原来的问题:优化和推理Haskell中的性能......

I wouldn't call myself an expert in Haskell, I have only been using it for 3 years, 2 of which at my workplace but I did have to optimize and get to understand how to reason somewhat about its performance. 我不会称自己是Haskell的专家,我只使用它3年,其中2个在我的工作场所,但我确实必须优化并理解如何在某种程度上推断其性能。

As mentionned in other post laziness is your friend and can help you gain performance however YOU have to be in control of what is lazily evaluated and what is strictly evaluated. 正如在其他职位上提到的懒惰是你的朋友并且可以帮助你获得表现,但是你必须控制懒惰的评估和严格评估的内容。

Check this comparison of foldl vs foldr 检查foldl与foldr的比较

foldl actually stores "how" to compute the value ie it is lazy. foldl实际上存储了“如何”来计算值,即它是懒惰的。 In some case you saves time and space beeing lazy, like the "infinite" fibs. 在某些情况下,你可以节省时间和空间,就像“无限”的纤维一样。 The infinite "fibs" doesn't generate all of them but knows how. 无限的“纤维”不会产生所有这些但是知道如何。 When you know you will need the value you might as well just get it "strictly" speaking... That's where strictness annotation are usefull, to give you back control. 当你知道你需要价值时,你也可以“严格地”说出来......这就是严格注释有用的地方,让你重新掌控。

I recall reading many times that in lisp you have to "minimize" consing. 我记得曾多次读过,在口齿不清中,你必须“尽量减少”。

Understanding what is stricly evaluated and how to force it is important but so is understanding how much "trashing" you do to the memory. 理解什么是严格评估以及如何强制它是很重要的,但理解你对记忆做了多少“捣乱”也是如此。 Remember Haskell is immutable, that means that updating a "variable" is actually creating a copy with the modification. 记住Haskell是不可变的,这意味着更新“变量”实际上是在创建一个带有修改的副本。 Prepending with (:) is vastly more efficient than appending with (++) because (:) does not copy memory contrarily to (++). 前缀为(:)比附加(++)更有效,因为(:)不会将内存复制到(++)。 Whenever a big atomic block is updated (even for a single char) the whole block needs to be copied to represent the "updated" version. 每当更新大的原子块时(即使是单个字符),整个块都需要被复制以表示“更新”版本。 The way you structure data and update it can have a big impact on performance. 构建数据和更新数据的方式会对性能产生很大影响。 The ghc profiler is your friend and will help you spot these. ghc profiler是你的朋友,可以帮助你发现这些。 Sure the garbage collector is fast but not having it do anything is faster! 当然垃圾收集器很快但没有做任何事情更快!

Cheers 干杯

Aside from the memoization issue, fib1 also uses non-tailcall recursion. 除了memoization问题,fib1还使用非tailcall递归。 Tailcall recursion can be re-factored automatically into a simple goto and perform very well, but the recursion in fib1 cannot be optimized in this way, because you need the stack frame from each instance of fib1 in order to calculate the result. Tailcall递归可以自动重新计算到一个简单的goto中并且执行得很好,但是fib1中的递归不能以这种方式优化,因为你需要来自fib1的每个实例的堆栈帧以便计算结果。 If you rewrote fib1 to pass a running total as an argument, thus allowing a tail call instead of needing to keep the stack frame for the final addition, the performance would improve immensely. 如果你重写了fib1以传递一个运行总数作为参数,从而允许尾调用而不是需要保留最后添加的堆栈帧,性能将会极大地提高。 But not as much as the memoized example, of course :) 但不像备忘的例子那么多,当然:)

Since allocation is a major cost in any functional language, an important part of understanding performance is to understand when objects are allocated, how long they live, when they die, and when they are reclaimed. 由于分配是任何功能语言的主要成本,理解性能的一个重要部分是了解何时分配对象,它们存活多长时间,何时死亡以及何时被回收。 To get this information you need a heap profiler . 要获取此信息,您需要堆分析器 It's an essential tool, and luckily GHC ships with a good one. 这是一个必不可少的工具,幸运的是GHC配备了一个很好的工具。

For more information, read Colin Runciman 's papers. 有关更多信息,请阅读Colin Runciman的论文。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM