简体   繁体   中英

How does Haskell's laziness work?

Consider this function that doubles all the elements in a list:

doubleMe [] = []
doubleMe (x:xs) = (2*x):(doubleMe xs)

Then consider the expression

doubleMe (doubleMe [a,b,c])

It seems obvious that, at runtime, this first expands to:

doubleMe ( (2*a):(doubleMe [b,c]) )

(It's obvious because no other possibilities exist as far as I can see).

But my question is this: Why exactly does this now expand to

2*(2*a) : doubleMe( doubleMe [b,c] )

instead of

doubleMe( (2*a):( (2*b) : doubleMe [c] ) )

?

Intuitively, I know the answer: Because Haskell is lazy. But can someone give me a more precise answer?

Is there something special about lists that causes this, or is the idea more general than that just lists?

doubleMe (doubleMe [a,b,c]) does not expand to doubleMe ( (2*a):(doubleMe [b,c]) ) . It expands to:

case doubleMe [a,b,c] of
  [] -> []
  (x:xs) -> (2*x):(doubleMe xs)

That is the outer function call is expanded first. That's the main difference between a lazy language and a strict one: When expanding a function call, you don't first evaluate the argument - instead you replace the function call with its body and leave the argument as-is for now.

Now the doubleMe needs to be expanded because the pattern matching needs to know the structure of its operand before it can be evaluated, so we get:

case (2*a):(doubleMe [b,c]) of
  [] -> []
  (x:xs) -> (2*x):(doubleMe xs)

Now the pattern matching can be replaced with the body of the second branch because we now know that the second branch is the one that matches. So we substitute (2*a) for x and doubleMe [b, c] for xs , giving us:

(2*(2*a)):(doubleMe (doubleMe [b,c]))

So that's how we arrive at that result.

Your “obvious” first step isn't actually quite so obvious. In fact what happens is rather like this:

doubleMe (...)
doubleMe ( { [] | (_:_) }? )
doubleMe ( doubleMe (...)! )

and only at that point does it actually “enter” the inner function. So it proceeds

doubleMe ( doubleMe (...) )
doubleMe ( doubleMe( { [] | (_:_) }? ) )
doubleMe ( doubleMe( a:_ ! ) )
doubleMe ( (2*a) : doubleMe(_) )
doubleMe ( (2*a):_ ! )

now here the outer doubleMe function has the “answer” to it's [] | (_:_) [] | (_:_) question, which was the only reason anything in the inner function was evaluated at all.

Actually, the next step is also not necessarily what you though: it depends on how you evaluate the outer result! For instance, if the whole expression was tail $ doubleMe ( doubleMe [a,b,c] ) , then it would actually expand more like

tail( { [] | (_:_) }? )
tail( doubleMe(...)! )
tail( doubleMe ( { [] | (_:_) }? ) )
...
tail( doubleMe ( doubleMe( a:_ ! ) ) )
tail( doubleMe ( _:_ ) )
tail( _ : doubleMe ( _ ) )
doubleMe ( ... )

ie it would in fact never really get to 2*a !

Others have already answered the general question. Let me add something on this specific point:

Is there something special about lists that causes this, or is the idea more general than that just lists?

No, lists are not special. Every data type in Haskell has a lazy semantics. Let's try a simple example using the pair type for integers (Int, Int) .

let pair :: (Int,Int)
    pair = (1, fst pair)
 in snd pair

Above, fst,snd are the pair projections, returning the first/second component of a pair. Also note that pair is a recursively defined pair. Yes, in Haskell you can recursively define everything, not just functions.

Under a lazy semantics, the above expression is roughly evaluated like this:

snd pair
= -- definition of pair
snd (1, fst pair)
= -- application of snd
fst pair
= -- definition of pair
fst (1, fst pair)
= -- application of fst
1

By comparison, using an eager semantics, we would evaluate it like this:

snd pair
= -- definition of pair
snd (1, fst pair)
= -- must evaluate arguments before application, expand pair again
snd (1, fst (1, fst pair))
= -- must evaluate arguments
snd (1, fst (1, fst (1, fst pair)))
= -- must evaluate arguments
...

In the eager evaluation, we insist on evaluating arguments before applying fst/snd , and we obtain a infinitely looping program. In some languages this will trigger a "stack overflow" error.

In the lazy evaluation, we apply functions soon, even if the argument is not fully evaluated. This makes snd (1, infiniteLoop) return 1 immediately.

So, lazy evaluation is not specific to lists. Anything is lazy in Haskell: trees, functions, tuples, records, user-defined data types, etc.

(Nitpick: if the programmer really asks for them, it is possible to define types having strict / eagerly-evaluated components. This can be done using strictness annotations, or using extensions such as unboxed types. While sometimes these have their uses, they're are not commonly found in Haskell programs.)

This is a good time to pull out equational reasoning, which means we can substitute a function for its definition (modulo renaming things to not have clashes). I'm going to rename doubleMe to d for brevity, though:

d [] = []                           -- Rule 1
d (x:xs) = (2*x) : d xs             -- Rule 2

d [1, 2, 3] = d (1:2:3:[])
            = (2*1) : d (2:3:[])    -- Rule 2
            = 2 : d (2:3:[])        -- Reduce
            = 2 : (2*2) : d (3:[])  -- Rule 2
            = 2 : 4 : d (3:[])      -- Reduce
            = 2 : 4 : (2*3) : d []  -- Rule 2
            = 2 : 4 : 6 : d []      -- Reduce
            = 2 : 4 : 6 : []        -- Rule 1
            = [2, 4, 6]

So now if we were to perform this with 2 layers of doubleMe / d :

d (d [1, 2, 3]) = d (d (1:2:3:[]))
                = d ((2*1) : d (2:3:[]))    -- Rule 2 (inner)
                = d (2 : d (2:3:[]))        -- Reduce
                = (2*2) : d (d (2:3:[]))    -- Rule 2 (outer)
                = 4 : d (d (2:3:[]))        -- Reduce
                = 4 : d ((2*2) : d (3:[]))  -- Rule 2 (inner)
                = 4 : d (4 : d (3:[]))      -- Reduce
                = 4 : 8 : d (d (3:[]))      -- Rule 2 (outer) / Reduce
                = 4 : 8 : d (6 : d [])      -- Rule 2 (inner) / Reduce
                = 4 : 8 : 12 : d (d [])     -- Rule 2 (outer) / Reduce
                = 4 : 8 : 12 : d []         -- Rule 1 (inner)
                = 4 : 8 : 12 : []           -- Rule 1 (outer)
                = [4, 8, 12]

Alternatively, you can choose to reduce at different points in time, resulting in

d (d [1, 2, 3]) = d (d (1:2:3:[]))
                = d ((2*1) : d (2:3:[]))
                = (2*(2*1)) : d (d (2:3:[]))
                = -- Rest of the steps left as an exercise for the reader
                = (2*(2*1)) : (2*(2*2)) : (2*(2*3)) : []
                = (2*2) : (2*4) : (2*6) : []
                = 4 : 6 : 12 : []
                = [4, 6, 12]

These are two possible expansions for this computation, but it's not specific to lists. You could apply it to a tree type:

data Tree a = Leaf a | Node a (Tree a) (Tree a)

Where pattern matching on Leaf and Node would be akin to matching on [] and : respectively, if you consider the list definition of

data [] a = [] | a : [a]

The reason why I say that these are two possible expansions is because the order in which it is expanded is up to the specific runtime and optimizations for the compiler you're using. If it sees an optimization that would make your program execute much faster it can choose that optimization. This is why laziness is often a boon, you don't have to think about the order in which things occurs as much because the compiler does that thinking for you. This wouldn't be possible in a language without purity, such as C#/Java/Python/etc. You can't rearrange computations since those computations might have side effects that depend on the order. But when performing pure calculations you don't have side effects and so the compiler has an easier job at optimizing your code.

doubleMe [] = []
doubleMe (x:xs) = (2*x):(doubleMe xs)

doubleMe (doubleMe [a,b,c])

I think different people expand these differently. I don't meant that they produce different results or anything, just that among people who do it correctly there isn't really a standard notation. Here's how I would do it:

-- Let's manually compute the result of *forcing* the following expression.
-- ("Forcing" = demanding that the expression be evaluated only just enough
-- to pattern match on its data constructor.)
doubleMe (doubleMe [a,b,c])

    -- The argument to the outer `doubleMe` is not headed by a constructor,
    -- so we must force the inner application of `doubleMe`.  To do that, 
    -- first force its argument to make it explicitly headed by a
    -- constructor.
    = doubleMe (doubleMe (a:[b,c]))

    -- Now that the argument has been forced we can tell which of the two
    -- `doubleMe` equations applies to it: the second one.  So we use that
    -- to rewrite it.
    = doubleMe (2*a : doubleMe [b,c])

    -- Since the argument to the outer `doubleMe` in the previous expression
    -- is headed by the list constructor `:`, we're done with forcing it.
    -- Now we use the second `doubleMe` equation to rewrite the outer
    -- function application. 
    = 2*2*a : doubleMe (doubleMe [b, c])

    -- And now we've arrived at an expression whose outermost operator
    -- is a data constructor (`:`).  This means that we've successfully 
    -- forced the expression, and can stop here.  There wouldn't be any
    -- further evaluation unless some consumer tried to match either of 
    -- the two subexpressions of this result. 

This is the same as sepp2k's and leftaroundabout's answers, just that they write it funny. sepp2k's answer has a case expression appearing seemingly out of nowhere—the multi-equational definition of doubleMe got implicitly rewritten as a single case expression. leftaroundabout's answer has a { [] | (_:_) }? { [] | (_:_) }? thing in it which apparently is a notation for "I have to force the argument until it looks like either [] or (_:_) ".

bhelkir's answer is similar to mine, but it's recursively forcing all of the subexpressions of the result as well, which wouldn't happen unless you have a consumer that demands it.

So no disrespect to anybody, but I like mine better. :-P

Write \\lambda ym to denote the abstracted version of doubleMe, and t for the list [a,b,c]. Then the term you want to reduce is

\y.m (\y.m t)

In other words, there are two redex. Haskell prefers to fire outermost redexes first since it is a normal order-ish language. However, this isn't quite true. doubleMe isn't really \\ym, and only really has a redex when it's "argument" has the correct shape (that of a list). Since this isn't yet a redex, and there are no redexes inside of (\\ym) we move to the right of the application. Since Haskell also would prefer to evaluate leftmost redexes first. Now, t really does have the shape of a list, so the redex (\\ym t) fires.

\y.m (a : (\y.m t'))

And then we go back to the top, and do the whole thing again. Except this time, the outermost term has a redex.

It does so because of how lists are defined and laziness. When you ask for the head of the list it evaluates that first element you asked for and saves the rest for later. All list processing operations are built on the head:rest concept, so intermediate results never come up.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM