简体   繁体   English

Haskell Fibonacci序列性能取决于方法

[英]Haskell Fibonacci sequence performance depending on methodology

I was trying out different approaches to getting a number at a given index of the Fibonacci sequence and they could basically be divided into two categories: 我正在尝试不同的方法来获取斐波那契数列的给定索引中的数字,它们基本上可以分为两类:

  • building a list and querying an index 建立清单并查询索引
  • using variables (might be separate or tupled, without a list) 使用变量(可能是独立的或元组的,没有列表)

I picked an example of both: 我选择了两个示例:

fibs1 :: Int -> Integer
fibs1 n = fibs1' !! n
    where fibs1' = 0 : scanl (+) 1 fibs1'

fib2 :: Int -> Integer
fib2 n = fib2' 1 1 n where
    fib2' _ b 2 = b
    fib2' a b n = fib2' b (a + b) (n - 1)

fibs1: fibs1:

real    0m2.356s
user    0m2.310s
sys     0m0.030s

fibs2: fibs2:

real    0m0.671s
user    0m0.667s
sys     0m0.000s

Both were compiled with 64bit GHC 7.6.1 and -O2 -fllvm . 两者均使用64位GHC 7.6.1和-O2 -fllvm进行编译。 Their core dumps are very similar in length, but they differ in the parts that I'm not very proficient at interpreting. 他们的核心转储在长度上非常相似,但是它们在我不太精通的部分上有所不同。

I was not surprised that fibs1 failed for n = 350000 ( Stack space overflow ). fibs1因n = 350000而失败( Stack space overflow ),我并不感到惊讶。 However, I am not comfortable with the fact that it used that much memory. 但是,我对它使用了这么多内存感到不满意。

I would like to clear some things up: 我想澄清一些事情:

  1. Why does the GC not take care of the beginning of the list throughout computation even though most of it quickly becomes useless? 为什么GC在整个计算过程中都不会立即考虑列表的开头,即使其中大多数很快变得毫无用处?
  2. Why does GHC not optimize the list version to a variable version since only two of its elements are required at once? 为什么GHC一次只需要两个元素,却不能将列表版本优化为可变版本?

EDIT: Sorry, I mixed the speed results, fixed. 编辑:对不起,我混合了速度结果,已修复。 Two of three of my doubts are still valid, though ;). 我的三个疑问中有两个仍然有效;)。

Why does the GC not take care of the beginning of the list throughout computation even though most of it quickly becomes useless? 为什么GC在整个计算过程中都不会立即考虑列表的开头,即使其中大多数很快变得毫无用处?

fibs1 uses a lot of memory and is slow because scanl is lazy, it doesn't evaluate the list elements, so fibs1使用大量内存,并且速度很慢,因为scanl是惰性的,它不评估列表元素,因此

fibs1' = 0 : scanl (+) 1 fibs1'

produces 产生

0 : scanl (+) 1 (0 : more)
0 : 1 : let f2 = 1+0 in scanl (+) f2 (1 : more')
0 : 1 : let f2 = 1+0 in f2 : let f3 = f2+1 in scanl (+) f3 (f2 : more'')
0 : 1 : let f2 = 1+0 in f2 : let f3 = f2+1 in f3 : let f4 = f3+f2 in scanl (+) f4 (f3 : more''')

etc. So you rather quickly get a huge nested thunk. 等等。因此,您很快就会得到一个巨大的嵌套thunk。 When that thunk is evaluated, it is pushed on the stack, and at some point between 250000 and 350000, it becomes too big for the default stack. 评估该thunk时,会将其压入堆栈,并且在250000到350000之间的某个点上,对于默认堆栈来说太大了。

And since each list element holds a reference to the previous while it is not evaluated, the beginning of the list cannot be garbage-collected. 而且,由于每个列表元素在未评估的情况下都保留了对前一个元素的引用,因此无法对列表的开头进行垃圾回收。

If you use a strict scan, 如果您使用严格扫描,

fibs1 :: Int -> Integer
fibs1 n = fibs1' !! n
  where
    fibs1' = 0 : scanl' (+) 1 fibs1'
    scanl' f a (x:xs) = let x' = f a x in x' `seq` (a : scanl' f x' xs)
    scanl' _ a [] = [a]

when the k -th list cell is produced, its value is already evaluated, so doesn't refer to a previous, hence the list can be garbage collected (assuming nothing else holds a reference to it) as it is traversed. 当生成第k个列表单元格时,其值已经被求值,因此不会引用前一个,因此遍历该列表时可以对其进行垃圾回收(假定没有其他内容引用该列表)。

With that implementation, the list version is about as fast and lean as fib2 (it needs to allocate list cells nevertheless, so it allocates a small bit more, and is possibly a tiny bit slower therefore, but the difference is minute, since the Fibonacci numbers become so large that the list construction overhead becomes negligible). 通过该实现,列表版本大约和fib2一样快和精简(尽管如此,它仍然需要分配列表单元格,因此它分配的空间更多一些,因此可能会稍慢一些,但是差异很小,因为斐波那契数字变得如此之大,以至于列表构建的开销可以忽略不计)。

The idea of scanl is that its result is incrementally consumed, so that the consumption forces the elements and prevents the build-up of large thunks. scanl的想法是,其结果被增量消耗,因此消耗会强制元素并防止大积木的堆积。

Why does GHC not optimize the list version to a variable version since only two of its elements are required at once? 为什么GHC一次只需要两个元素,却不能将列表版本优化为可变版本?

Its optimiser can't see through the algorithm to determine that. 它的优化程序无法通过算法来确定这一点。 scanl is opaque to the compiler, it doesn't know what scanl does. scanl对编译器是不透明的,它不知道scanl做什么。

If we take the exact source code for scanl (renaming it or hiding scanl from the Prelude, I opted for renaming), 如果我们获取scanl的确切源代码(将其重命名或从Prelude中隐藏scanl ,我选择了重命名),

scans                   :: (b -> a -> b) -> b -> [a] -> [b]
scans f q ls            =  q : (case ls of
                                []   -> []
                                x:xs -> scans f (f q x) xs)

and compile the module exporting it (with -O2), and then look at the generated interface file with 并编译导出它的模块(使用-O2),然后使用以下命令查看生成的接口文件

ghc --show-iface Scan.hi

we get (for example, minor differences between compiler versions) 我们得到(例如,编译器版本之间的细微差异)

Magic: Wanted 33214052,
       got    33214052
Version: Wanted [7, 0, 6, 1],
         got    [7, 0, 6, 1]
Way: Wanted [],
     got    []
interface main:Scan 7061
  interface hash: ef57dac14815e2f1f897b42a007c0c81
  ABI hash: 8cfc8dab79de6a51fcad666f1869574f
  export-list hash: 57d6805e5f0b5f76f0dd8dfb228df988
  orphan hash: 693e9af84d3dfcc71e640e005bdc5e2e
  flag hash: 1e8135cb44ef6dd330f1f56943d1f463
  used TH splices: False
  where
exports:
  Scan.scans
module dependencies:
package dependencies: base* ghc-prim integer-gmp
orphans: base:GHC.Base base:GHC.Float base:GHC.Real
family instance modules:
import  -/  base:Prelude 1cb4b618cf45281dc97748b1831bf0cd
d79ca4e223c0de0a770a3b88a5e67687
  scans :: forall b a. (b -> a -> b) -> b -> [a] -> [b]
    {- Arity: 3, HasNoCafRefs, Strictness: LLL -}
vectorised variables:
vectorised tycons:
vectorised reused tycons:
scalar variables:
scalar tycons:
trusted: safe-inferred
require own pkg trusted: False

and see that the interface file doesn't expose the unfolding of the function, only its type, arity, strictness and that it doesn't refer to CAFs. 并看到接口文件没有公开功能的展开,只是公开了其类型,一致性,严格性,并且没有引用CAF。

When a module importing that is compiled, all that the compiler has to go by is the information exposed by the interface file. 导入已编译的模块时,编译器所要做的只是接口文件公开的信息。

Here, there is no information exposed that would allow the compiler to do anything else but emit a call to the function. 这里,没有公开的信息允许编译器执行其他任何操作,但发出对函数的调用。

If the unfolding were exposed, the compiler had a chance to inline the unfolding and analyse the code knowing the types and combination function to produce more eager code that doesn't build thunks. 如果展开是暴露的,则编译器就有机会内联展开并分析代码,从而知道类型和组合函数,从而生成更热切的代码,而不会生成乱码。

The semantics of scanl , however, are maximally lazy, each element of the output is emitted before the input list is inspected. 但是scanl的语义是最大程度的惰性,在检查输入列表之前会发出输出的每个元素。 That has the consequence that GHC can't make the addition strict, since that would change the result if the list contained any undefined values: 这样做的结果是,GHC无法使添加严格,因为如果列表包含任何未定义的值,则将改变结果:

scanl (+) 1 [undefined] = 1 : scanl (+) (1 + undefined) [] = 1 : (1 + undefined) : []

while

scanl' (+) 1 [undefined] = let x' = 1 + undefined in x' `seq` 1 : scanl' (+) x' []
                         = *** Exception: Prelude.undefined

One could make a variant 一个可以变的人

scanl'' f b (x:xs) = b `seq` b : scanl'' f (f b x) xs

that would produce 1 : *** Exception: Prelude.undefined for the above input, but any strictness would indeed change the result if the list contained undefined values, so even if the compiler knew the unfolding, it couldn't make the evaluation strict - unless it could prove that there are no undefined values in the list, a fact that is obvious to us, but not the compiler [and I don't think it would be easy to teach a compiler recognize that and be able to prove the absence of undefined values]. 会产生1 : *** Exception: Prelude.undefined以上输入为1 : *** Exception: Prelude.undefined ,但是如果列表包含未定义的值,任何严格的规定确实会改变结果,因此即使编译器知道展开的情况,也无法使求值结果严格-除非可以证明列表中没有未定义的值,否则这一事实对我们来说是显而易见的,但对于编译器而言则不明显[而且我认为教导编译器认识到这一点并能够证明缺少不确定的值]。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM