[英]foldr & foldl Haskell explanation
We have been asked to answer whether foldr
or foldl
is more efficient. 我们被要求回答foldr
或foldl
是否更有效。
I am not sure, but doesn't it depend on what I am doing, especially what I want to reach with my functions? 我不确定,但这不取决于我在做什么,尤其是我想通过我的功能达到什么目标?
Is there a difference from case to case or can one say that foldr
or foldl
is better , because... 不同情况是否存在差异,或者可以说foldr
或foldl
更好,因为......
Is there a general answer ? 有一般答案吗?
Thanks in advance! 提前致谢!
A fairly canonical source on this question is Foldr Foldl Foldl' on the Haskell Wiki. 关于这个问题的一个相当规范的来源是Haskell Wiki上的Foldr Foldl Foldl 。 In summary, depending on how strictly you can combine elements of the list and what the result of your fold is you may decide to choose either foldr
or foldl'
. 总之,根据您可以如何严格地组合列表的元素以及折叠的结果,您可以决定选择foldr
或foldl'
。 It's rarely the right choice to choose foldl
. 选择foldl
很少是正确的选择。
Generally, this is a good example of how you have to keep in mind the laziness and strictness of your functions in order to compute efficiently in Haskell. 通常,这是一个很好的例子,说明如何在Haskell中有效地计算函数的懒惰和严格性。 In strict languages, tail-recursive definitions and TCO are the name of the game, but those kinds of definitions may be too "unproductive" (not lazy enough) for Haskell leading to the production of useless thunks and fewer opportunities for optimization. 在严格的语言中,尾递归定义和TCO是游戏的名称,但是对于Haskell而言,这些定义可能过于“无效”(不够懒惰)导致产生无用的thunk并且优化机会更少。
foldr
何时选择foldr
If the operation that consumes the result of your fold can operate lazily and your combining function is non-strict in its right argument, then foldr
is usually the right choice. 如果消耗折叠结果的操作可以懒惰地操作而且你的组合函数在其右边的参数中是非严格的,那么foldr
通常是正确的选择。 The quintessential example of this is the nonfold
. 这方面的典型例子nonfold
。 First we see that (:)
is non-strict on the right 首先我们看到(:)
在右边是非严格的
head (1 : undefined)
1
Then here's nonfold
written using foldr
然后这里是使用foldr
编写的非nonfold
nonfoldr :: [a] -> [a]
nonfoldr = foldr (:) []
Since (:)
creates lists lazily, an expression like head . nonfoldr
因为(:)
懒惰地创建列表,像head . nonfoldr
这样的表达式head . nonfoldr
head . nonfoldr
can be very efficient, requiring just one folding step and forcing just the head of the input list. head . nonfoldr
可以非常高效,只需要一个折叠步骤并且只需要输入列表的头部。
head (nonfoldr [1,2,3])
head (foldr (:) [] [1,2,3])
head (1 : foldr (:) [] [2,3])
1
A very common place where laziness wins out is in short-circuiting computations. 懒惰胜出的一个非常常见的地方是短路计算。 For instance, lookup :: Eq a => a -> [a] -> Bool
can be more productive by returning the moment it sees a match. 例如,通过返回它看到匹配的那一刻, lookup :: Eq a => a -> [a] -> Bool
可以提高效率。
lookupr :: Eq a => a -> [a] -> Bool
lookupr x = foldr (\y inRest -> if x == y then True else inRest) False
The short-circuiting occurs because we discard isRest
in the first branch of the if
. 发生短路是因为我们在if
的第一个分支中丢弃了isRest
。 The same thing implemented in foldl'
can't do that. 在foldl'
实现的相同的事情不能做到这一点。
lookupl :: Eq a => a -> [a] -> Bool
lookupl x = foldl' (\wasHere y -> if wasHere then wasHere else x == y) False
lookupr 1 [1,2,3,4]
foldr fn False [1,2,3,4]
if 1 == 1 then True else (foldr fn False [2,3,4])
True
lookupl 1 [1,2,3,4]
foldl' fn False [1,2,3,4]
foldl' fn True [2,3,4]
foldl' fn True [3,4]
foldl' fn True [4]
foldl' fn True []
True
foldl'
何时选择foldl'
If the consuming operation or the combining requires that the entire list is processed before it can proceed, then foldl'
is usually the right choice. 如果消费操作或组合需要在可以继续之前处理整个列表,那么foldl'
通常是正确的选择。 Often the best check for this situation is to ask yourself whether your combining function is strict---if it's strict in the first argument then your whole list must be forced anyway. 通常,对这种情况的最佳检查是问问自己你的组合功能是否严格 - 如果在第一个参数中它是严格的那么你的整个列表必须被强制。 The quintessential example of this is sum
这方面的典型例子是sum
sum :: Num a => [a] -> a
sum = foldl' (+) 0
since (1 + 2)
cannot be reasonably consumed prior to actually doing the addition (Haskell isn't smart enough to know that 1 + 2 >= 1
without first evaluating 1 + 2
) then we don't get any benefit from using foldr
. 因为(1 + 2)
在实际添加之前不能合理地消耗(Haskell不够聪明,不知道1 + 2 >= 1
而没有先评估1 + 2
)然后我们没有从使用foldr
获得任何好处。 Instead, we'll use the strict combining property of foldl'
to make sure that we evaluate things as eagerly as needed 相反,我们将使用foldl'
的严格组合属性来确保我们根据需要急切地评估事物
sum [1,2,3]
foldl' (+) 0 [1,2,3]
foldl' (+) 1 [2,3]
foldl' (+) 3 [3]
foldl' (+) 6 []
6
Note that if we pick foldl
here we don't get quite the right result. 请注意,如果我们在这里选择foldl
,我们就得不到相应的结果。 While foldl
has the same associativity as foldl'
, it doesn't force the combining operation with seq
like foldl'
does. 虽然foldl
与foldl'
具有相同的关联性,但它不会强制使用像foldl'
那样的seq
组合操作。
sumWrong :: Num a => [a] -> a
sumWrong = foldl (+) 0
sumWrong [1,2,3]
foldl (+) 0 [1,2,3]
foldl (+) (0 + 1) [2,3]
foldl (+) ((0 + 1) + 2) [3]
foldl (+) (((0 + 1) + 2) + 3) []
(((0 + 1) + 2) + 3)
((1 + 2) + 3)
(3 + 3)
6
We get extra, useless thunks (space leak) if we choose foldr
or foldl
when in foldl'
sweet spot and we get extra, useless evaluation (time leak) if we choose foldl'
when foldr
would have been a better choice. 如果我们在foldl'
最佳位置选择foldr
或foldl
,我们会得到额外的,无用的thunk(空间泄漏),如果我们选择foldl'
当foldr
是更好的选择时,我们会得到额外的,无用的评估(时间泄漏)。
nonfoldl :: [a] -> [a]
nonfoldl = foldl (:) []
head (nonfoldl [1,2,3])
head (foldl (:) [] [1,2,3])
head (foldl (:) [1] [2,3])
head (foldl (:) [1,2] [3]) -- nonfoldr finished here, O(1)
head (foldl (:) [1,2,3] [])
head [1,2,3]
1 -- this is O(n)
sumR :: Num a => [a] -> a
sumR = foldr (+) 0
sumR [1,2,3]
foldr (+) 0 [1,2,3]
1 + foldr (+) 0 [2, 3] -- thunks begin
1 + (2 + foldr (+) 0 [3])
1 + (2 + (3 + foldr (+) 0)) -- O(n) thunks hanging about
1 + (2 + (3 + 0)))
1 + (2 + 3)
1 + 5
6 -- forced O(n) thunks
In languages with strict/eager evaluation, folding from the left can be done in constant space, while folding from the right requires linear space (over the number of elements of the list). 在具有严格/急切评估的语言中,从左侧折叠可以在恒定空间中完成,而从右侧折叠需要线性空间(在列表的元素数量上)。 Because of this, many people who first approach Haskell come over with this preconception. 因此,许多首先接近Haskell的人都会接受这种先入为主的观点。
But that rule of thumb doesn't work in Haskell , because of lazy evaluation. 但是由于懒惰的评估, 这个经验法则在Haskell中不起作用 。 It's possible in Haskell to write constant space functions with foldr
. 在Haskell中可以使用foldr
编写常量空间函数。 Here is one example: 这是一个例子:
find :: (a -> Bool) -> [a] -> Maybe a
find p = foldr (\x next -> if p x then Just x else next) Nothing
Let's try hand-evaluating find even [1, 3, 4]
: 让我们尝试手工评估find even [1, 3, 4]
:
-- The definition of foldr, for reference:
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
find even (1:3:4:[])
= foldr (\x next -> if even x then Just x else next) (1:3:4:[])
= if even 1 then Just 1 else foldr (\x next -> if even x then Just x else next) (3:4:[])
= foldr (\x next -> if even x then Just x else next) (3:4:[])
= if even 3 then Just 3 else foldr (\x next -> if even x then Just x else next) (4:[])
= foldr (\x next -> if even x then Just x else next) (4:[])
= if even 4 then Just 4 else foldr (\x next -> if even x then Just x else next) []
= Just 4
The size of the expressions in the intermediate steps has a constant upper bound—this actually means that this evaluation can be carried out in constant space. 中间步骤中表达式的大小具有恒定的上限 - 这实际上意味着该评估可以在恒定的空间中执行。
Another reason why foldr
in Haskell can run in constant space is because of the list fusion optimizations in GHC . Haskell中的foldr
可以在恒定空间中运行的另一个原因是GHC中的列表融合优化 。 GHC in many cases can optimize a foldr
into a constant-space loop over a constant-space producer. 在许多情况下,GHC可以优化foldr
到恒定空间生成器上的恒定空间循环。 It cannot generally do that for a left fold. 对于左侧折叠,通常不能这样做。
Nonetheless, left folds in Haskell can be written to use tail recursion, which can lead to performance benefits. 尽管如此,Haskell中的左侧折叠可以编写为使用尾递归,这可以带来性能优势。 The thing is that for this to actually succeed you need to be very careful about laziness—naïve attempts at writing a tail recursive algorithm normally lead to linear-space execution, because of an accumulation of unevaluated expressions. 事实是,为了实现这一点,你需要非常小心懒惰 - 天真地尝试编写尾递归算法通常会导致线性空间执行,因为未评估表达式的积累。
Takeaway lessons: 外卖课程:
Prelude
and Data.List
as much as possible, because they've been carefully written to exploit performance features like list fusion. 当你在Haskell开始时,尝试尽可能多地使用Prelude
和Data.List
库函数,因为它们已经过仔细编写以利用列表融合等性能特性。 foldr
first. 如果你需要折列表,请尝试foldr
第一。 foldl
, use foldl'
(the version that avoids unevaluated expressions). 永远不要使用foldl
,使用foldl'
(避免未评估表达式的版本)。 (Please read the comments on this post. Some interesting points were made and what I wrote here isn't completely true!) (请阅读这篇文章的评论。一些有趣的观点和我在这里写的内容并不完全正确!)
It depends. 这取决于。 foldl is usually faster since it's tail recursive, meaning (sort of), that all computation is done in-place and there's no call-stack. foldl通常更快,因为它的尾递归,意思是(有点),所有计算都是就地完成的,并且没有调用堆栈。 For reference: 以供参考:
foldl f a [] = a
foldl f a (x:xs) = foldl f (f a x) xs
To run foldr we do need a call stack, since there is a "pending" computation for f
. 要运行foldr,我们需要一个调用堆栈,因为f
有一个“挂起”计算。
foldr f a [] = a
foldr f a (x:xs) = f x (foldr f a xs)
On the other hand, foldr can short-circuit if f is not strict in its first argument. 另一方面,如果f在其第一个参数中不严格,则foldr可能短路。 It's lazier in a way. 它在某种程度上比较懒散 。 For example, if we define a new product 例如,如果我们定义一个新产品
prod 0 x = 0
prod x 0 = 0
prod x y = x*y
Then 然后
foldr prod 1 [0...n]
Takes constant time in n, but 在n中占用恒定时间,但是
foldl prod 1 [0...n]
takes linear time. 需要线性时间。 (This will not work using (*)
, since it does not check if any argument is 0. So we create a non-strict version. Thanks to Ingo and Daniel Lyons for pointing it out in the comments) (这不会使用(*)
,因为它不检查是否有任何参数为0.所以我们创建一个非严格的版本。感谢Ingo和Daniel Lyons在评论中指出它)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.