Foldl memory performance in GHC 8.0.x

Question

I got into a weird issue while checking memory usage of some code I was working on.

Using foldl to sum the elements of a very big list, I get a constant memory usage.

Using foldl' I get a constant memory usage as well (as expected).

Using foldr the memory grows and bring my system to knees (no stack overflow exception as I would expect).

The minimum code needed to trigger it is: main = print $ foldx (+) 0 [1..100000000000000000000]

where foldx is foldl , foldr or foldl'

I was under the impression (as per Foldr Foldl Foldl' ) that the opposite would have been true.

I setup a repo with the aforementioned code: https://github.com/framp/hs-fold-perf-test

What's going on here? Is it GHC 8.0.x being too smart? I'm on macOS Sierra

Thanks

Answer 1

`foldl` and `foldl'`

In this case, GHC sees that foldl can be made strict and essentially rewrites it to utilise foldl' . See below how GHC optimizes the foldl construct.

Note that this only applies because you compiled with optimizations -O . Without optimizations the foldl programs consumes all my memory and crashes.

Looking at the output of ghc -O -fforce-recomp -ddump-simpl foldl.hs we can see that GHC eliminates the huge list used entirely and optimizes the expression to a tail recursive function:

Rec {
-- RHS size: {terms: 20, types: 5, coercions: 0, joins: 0/0}
Main.main_go [Occ=LoopBreaker] :: Integer -> Integer -> Integer
[GblId, Arity=2, Str=<S,U><S,1*U>]
Main.main_go
  = \ (x_a36m :: Integer) (eta_B1 :: Integer) ->
      case integer-gmp-1.0.0.1:GHC.Integer.Type.gtInteger#
             x_a36m lim_r4Yv
      of wild_a36n
      { __DEFAULT ->
      case GHC.Prim.tagToEnum# @ Bool wild_a36n of {
        False ->
          Main.main_go
            (integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger
               x_a36m 1)
            (integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger eta_B1 x_a36m);
        True -> eta_B1
      }
      }
end Rec }

Which explains why it runs with constant memory usage.

Why does `foldr` need that much memory?

foldr builds up a lot of thunks, which are essentially unfinished computations which will hold the correct value eventually. Essentially, when trying to evaluate the foldr expression, this happens:

foldr (+) 0 [1..100]
== (+) 1 $ foldr 0 [2..100]
== (+) 1 $ (+) 2 $ foldr [3..100]
...
== (+) 1 $ (+) 2 $ .. $ (+) 99 $ (+) 100 0 -- at this point there are 100
== (+) 1 $ (+) 2 $ .. $ (+) 99 $ 100       -- unevaluated computations, which
== (+) 1 $ (+) 2 $ .. $ (+) 199            -- take up a lot of memory
...
== (+) 1 $ 5049
== 5050

The limit of 100000000000000000000 is just big for the thunks to take up more space than your RAM and you program crashes.

Foldl memory performance in GHC 8.0.x

Question

1 answers

solution1
5 ACCPTED 2017-03-12 12:47:10

`foldl` and `foldl'`

Why does `foldr` need that much memory?

Foldl memory performance in GHC 8.0.x

Question

1 answers

solution1 5 ACCPTED 2017-03-12 12:47:10

foldl and foldl'

Why does foldr need that much memory?

solution1
5 ACCPTED 2017-03-12 12:47:10

`foldl` and `foldl'`

Why does `foldr` need that much memory?