I got into a weird issue while checking memory usage of some code I was working on.
Using foldl
to sum the elements of a very big list, I get a constant memory usage.
Using foldl'
I get a constant memory usage as well (as expected).
Using foldr
the memory grows and bring my system to knees (no stack overflow exception as I would expect).
The minimum code needed to trigger it is: main = print $ foldx (+) 0 [1..100000000000000000000]
where foldx is foldl
, foldr
or foldl'
I was under the impression (as per Foldr Foldl Foldl' ) that the opposite would have been true.
I setup a repo with the aforementioned code: https://github.com/framp/hs-fold-perf-test
What's going on here? Is it GHC 8.0.x being too smart? I'm on macOS Sierra
Thanks
foldl
and foldl'
In this case, GHC sees that foldl
can be made strict and essentially rewrites it to utilise foldl'
. See below how GHC optimizes the foldl
construct.
Note that this only applies because you compiled with optimizations -O
. Without optimizations the foldl
programs consumes all my memory and crashes.
Looking at the output of ghc -O -fforce-recomp -ddump-simpl foldl.hs
we can see that GHC eliminates the huge list used entirely and optimizes the expression to a tail recursive function:
Rec {
-- RHS size: {terms: 20, types: 5, coercions: 0, joins: 0/0}
Main.main_go [Occ=LoopBreaker] :: Integer -> Integer -> Integer
[GblId, Arity=2, Str=<S,U><S,1*U>]
Main.main_go
= \ (x_a36m :: Integer) (eta_B1 :: Integer) ->
case integer-gmp-1.0.0.1:GHC.Integer.Type.gtInteger#
x_a36m lim_r4Yv
of wild_a36n
{ __DEFAULT ->
case GHC.Prim.tagToEnum# @ Bool wild_a36n of {
False ->
Main.main_go
(integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger
x_a36m 1)
(integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger eta_B1 x_a36m);
True -> eta_B1
}
}
end Rec }
Which explains why it runs with constant memory usage.
foldr
need that much memory? foldr
builds up a lot of thunks, which are essentially unfinished computations which will hold the correct value eventually. Essentially, when trying to evaluate the foldr
expression, this happens:
foldr (+) 0 [1..100]
== (+) 1 $ foldr 0 [2..100]
== (+) 1 $ (+) 2 $ foldr [3..100]
...
== (+) 1 $ (+) 2 $ .. $ (+) 99 $ (+) 100 0 -- at this point there are 100
== (+) 1 $ (+) 2 $ .. $ (+) 99 $ 100 -- unevaluated computations, which
== (+) 1 $ (+) 2 $ .. $ (+) 199 -- take up a lot of memory
...
== (+) 1 $ 5049
== 5050
The limit of 100000000000000000000
is just big for the thunks to take up more space than your RAM and you program crashes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.