简体   繁体   中英

Foldl memory performance in GHC 8.0.x

I got into a weird issue while checking memory usage of some code I was working on.

Using foldl to sum the elements of a very big list, I get a constant memory usage.

Using foldl' I get a constant memory usage as well (as expected).

Using foldr the memory grows and bring my system to knees (no stack overflow exception as I would expect).

The minimum code needed to trigger it is: main = print $ foldx (+) 0 [1..100000000000000000000]

where foldx is foldl , foldr or foldl'

I was under the impression (as per Foldr Foldl Foldl' ) that the opposite would have been true.

I setup a repo with the aforementioned code: https://github.com/framp/hs-fold-perf-test

What's going on here? Is it GHC 8.0.x being too smart? I'm on macOS Sierra

Thanks

foldl and foldl'

In this case, GHC sees that foldl can be made strict and essentially rewrites it to utilise foldl' . See below how GHC optimizes the foldl construct.

Note that this only applies because you compiled with optimizations -O . Without optimizations the foldl programs consumes all my memory and crashes.


Looking at the output of ghc -O -fforce-recomp -ddump-simpl foldl.hs we can see that GHC eliminates the huge list used entirely and optimizes the expression to a tail recursive function:

Rec {
-- RHS size: {terms: 20, types: 5, coercions: 0, joins: 0/0}
Main.main_go [Occ=LoopBreaker] :: Integer -> Integer -> Integer
[GblId, Arity=2, Str=<S,U><S,1*U>]
Main.main_go
  = \ (x_a36m :: Integer) (eta_B1 :: Integer) ->
      case integer-gmp-1.0.0.1:GHC.Integer.Type.gtInteger#
             x_a36m lim_r4Yv
      of wild_a36n
      { __DEFAULT ->
      case GHC.Prim.tagToEnum# @ Bool wild_a36n of {
        False ->
          Main.main_go
            (integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger
               x_a36m 1)
            (integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger eta_B1 x_a36m);
        True -> eta_B1
      }
      }
end Rec }

Which explains why it runs with constant memory usage.

Why does foldr need that much memory?

foldr builds up a lot of thunks, which are essentially unfinished computations which will hold the correct value eventually. Essentially, when trying to evaluate the foldr expression, this happens:

foldr (+) 0 [1..100]
== (+) 1 $ foldr 0 [2..100]
== (+) 1 $ (+) 2 $ foldr [3..100]
...
== (+) 1 $ (+) 2 $ .. $ (+) 99 $ (+) 100 0 -- at this point there are 100
== (+) 1 $ (+) 2 $ .. $ (+) 99 $ 100       -- unevaluated computations, which
== (+) 1 $ (+) 2 $ .. $ (+) 199            -- take up a lot of memory
...
== (+) 1 $ 5049
== 5050

The limit of 100000000000000000000 is just big for the thunks to take up more space than your RAM and you program crashes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM