简体   繁体   English

GHC Haskell 什么时候自动记忆?

[英]When is memoization automatic in GHC Haskell?

I can't figure out why m1 is apparently memoized while m2 is not in the following:我无法弄清楚为什么 m1 显然被记忆而 m2 不在以下内容中:

m1      = ((filter odd [1..]) !!)

m2 n    = ((filter odd [1..]) !! n)

m1 10000000 takes about 1.5 seconds on the first call, and a fraction of that on subsequent calls (presumably it caches the list), whereas m2 10000000 always takes the same amount of time (rebuilding the list with each call). m1 10000000 在第一次调用时大约需要 1.5 秒,在随后的调用中只需要一小部分(大概它缓存了列表),而 m2 10000000 总是花费相同的时间(每次调用重建列表)。 Any idea what's going on?知道发生了什么吗? Are there any rules of thumb as to if and when GHC will memoize a function?关于 GHC 是否以及何时记忆功能是否有任何经验法则? Thanks.谢谢。

GHC does not memoize functions. GHC 不记忆函数。

It does, however, compute any given expression in the code at most once per time that its surrounding lambda-expression is entered, or at most once ever if it is at top level.但是,它确实在每次输入其周围的 lambda 表达式时最多计算一次代码中的任何给定表达式,或者如果它处于顶层,则最多计算一次。 Determining where the lambda-expressions are can be a little tricky when you use syntactic sugar like in your example, so let's convert these to equivalent desugared syntax:当您在示例中使用语法糖时,确定 lambda 表达式的位置可能有点棘手,因此让我们将它们转换为等效的脱糖语法:

m1' = (!!) (filter odd [1..])              -- NB: See below!
m2' = \n -> (!!) (filter odd [1..]) n

(Note: The Haskell 98 report actually describes a left operator section like (a %) as equivalent to \\b -> (%) ab , but GHC desugars it to (%) a . These are technically different because they can be distinguished by seq . I think I might have submitted a GHC Trac ticket about this.) (注意:Haskell 98 报告实际上将像(a %)这样的左运算符部分描述为等价于\\b -> (%) ab ,但 GHC 将其脱糖为(%) a 。这些在技术上是不同的,因为它们可以通过以下方式区分seq 。我想我可能已经提交了关于此的 GHC Trac 票。)

Given this, you can see that in m1' , the expression filter odd [1..] is not contained in any lambda-expression, so it will only be computed once per run of your program, while in m2' , filter odd [1..] will be computed each time the lambda-expression is entered, ie, on each call of m2' .鉴于此,你可以看到,在m1' ,表达式filter odd [1..]不包含在任何拉姆达表达,所以它只会计算每个程序的运行一次,而在m2'filter odd [1..]将在每次输入 lambda 表达式时计算,即在每次调用m2' That explains the difference in timing you are seeing.这解释了您所看到的时间差异。


Actually, some versions of GHC, with certain optimization options, will share more values than the above description indicates.实际上,某些具有某些优化选项的 GHC 版本将共享比上述说明所指示的更多的值。 This can be problematic in some situations.这在某些情况下可能会出现问题。 For example, consider the function例如,考虑函数

f = \x -> let y = [1..30000000] in foldl' (+) 0 (y ++ [x])

GHC might notice that y does not depend on x and rewrite the function to GHC 可能会注意到y不依赖于x并将函数重写为

f = let y = [1..30000000] in \x -> foldl' (+) 0 (y ++ [x])

In this case, the new version is much less efficient because it will have to read about 1 GB from memory where y is stored, while the original version would run in constant space and fit in the processor's cache.在这种情况下,新版本的效率要低得多,因为它必须从存储y内存中读取大约 1 GB 的数据,而原始版本将在恒定空间中运行并放入处理器的缓存中。 In fact, under GHC 6.12.1, the function f is almost twice as fast when compiled without optimizations than it is compiled with -O2 .事实上,在 GHC 6.12.1 下,函数f没有优化的情况下编译时的速度几乎是使用-O2编译时的两倍。

m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation. m1 只计算一次,因为它是一个常量应用形式,而 m2 不是 CAF,因此每次评估都会计算一次。

See the GHC wiki on CAFs: http://www.haskell.org/haskellwiki/Constant_applicative_form请参阅有关 CAF 的 GHC wiki: http : //www.haskell.org/haskellwiki/Constant_applicative_form

There is a crucial difference between the two forms: the monomorphism restriction applies to m1 but not m2, because m2 has explicitly given arguments.这两种形式有一个关键的区别:单态限制适用于 m1 但不适用于 m2,因为 m2 已经明确给出了参数。 So m2's type is general but m1's is specific.所以 m2 的类型是通用的,而 m1 的类型是特定的。 The types they are assigned are:它们被分配的类型是:

m1 :: Int -> Integer
m2 :: (Integral a) => Int -> a

Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.大多数 Haskell 编译器和解释器(我实际上知道的所有这些编译器和解释器)都不会记忆多态结构,因此每次调用时都会重新创建 m2 的内部列表,而 m1 则不会。

I'm not sure, because I'm quite new to Haskell myself, but it appears that it's beacuse the second function is parametrized and the first one is not.我不确定,因为我自己对 Haskell 还是很陌生,但似乎是因为第二个函数是参数化的,而第一个不是。 The nature of the function is that, it's result depends on input value and in functional paradigm especailly it depends ONLY on the input.函数的本质是,它的结果取决于输入值,尤其是在函数范式中,它仅取决于输入。 Obvious implication is that a function with no parameters returns always the same value over and over, no matter what.明显的含义是,一个没有参数的函数总是一遍又一遍地返回相同的值,无论如何。

Aparently there's an optimizing mechanizm in GHC compiler that exploits this fact to compute the value of such a function only once for whole program runtime.显然 GHC 编译器中有一个优化机制,它利用这一事实在整个程序运行时只计算一次这样的函数的值。 It does it lazily, to be sure, but does it nonetheless.可以肯定的是,它很懒惰,但仍然这样做。 I noticed it myself, when I wrote the following function:当我编写以下函数时,我自己注意到了:

primes = filter isPrime [2..]
    where isPrime n = null [factor | factor <- [2..n-1], factor `divides` n]
        where f `divides` n = (n `mod` f) == 0

Then to test it, I entered GHCI and wrote: primes !! 1000然后为了测试它,我进入 GHCI 并写道: primes !! 1000 primes !! 1000 . primes !! 1000 It took a few seconds, but finally I got the answer: 7927 .花了几秒钟,但最终我得到了答案: 7927 Then I called primes !! 1001然后我叫primes !! 1001 primes !! 1001 and got the answer instantly. primes !! 1001并立即得到答案。 Similarly in an instant I got the result for take 1000 primes , because Haskell had to compute the whole thousand-element list to return 1001st element before.类似地,我立即得到了take 1000 primes的结果,因为 Haskell 必须计算整个千元素列表才能返回第 1001 个元素。

Thus if you can write your function such that it takes no parameters, you probably want it.因此,如果您可以编写不带参数的函数,那么您可能需要它。 ;) ;)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM