简体   繁体   中英

where clauses in list comprehensions

What is the difference between the following two formulas?

cp [] = [[]]
cp (xs:xss) = [x:ys | x <- xs, ys <- cp xss]
----------------------------------------------
cp [] = [[]]
cp (xs:xss) = [x:ys | x <- xs, ys <- yss]
              where yss = cp xss

Sample output: cp [[1,2,3],[4,5]] => [[1,4],[1,5],[2,4],[2,5],[3,4],[3,5]]

According to Thinking Functionally With Haskell (p. 92), the second version is "a more efficient definition...[which] guarantees that cp xss is computed just once," though the author never explains why. I would have thought they were equivalent.

The two definitions are equivalent in the sense that they denote the same value, of course.

Operationally they differ in the sharing behavior under call-by-need evaluation. jcast already explained why, but I want to add a shortcut that does not require explicitly desugaring the list comprehension. The rule is: any expression that is syntactically in a position where it could depend on a variable x will be recomputed each time the variable x is bound to a value, even if the expression does not actually depend on x .

In your case, in the first definition, x is in scope in the position where cp xss appears, so cp xss will be re-evaluated for each element x of xs . In the second definition cp xss appears outside the scope of x so it will be computed just once.

Then the usual disclaimers apply, namely:

  • The compiler is not required to adhere to the operational semantics of call-by-need evaluation, only the denotational semantics. So it might compute things fewer times (floating out) or more times (floating in) than you would expect based on the above rule.

  • It's not true in general that more sharing is better. In this case, for example, it's probably not better because the size of cp xss grows as quickly as the amount of work that it took to compute it in the first place. In this situation the cost of reading the value back from memory can exceed that of recomputing the value (due to the cache hierarchy and the GC).

Well, a naive de-sugaring would be:

cp [] = [[]]
cp (xs:xss) = concatMap (\x -> concatMap (\ ys -> [ x:ys ]) (cp xss)) xs
----------------------------------------------
cp [] = [[]]
cp (xs:xss) = let yss = cp xss in concatMap (\x -> concatMap (\ ys -> [ x:ys ]) yss) xs

As you can see, in the first version the call cp xss is inside a lambda. Unless the optimizer moves it, that means it will get re-evaluated each time the function \\x -> concatMap (\\ ys -> [ x:ys ]) (cp xss) gets called. By floating it out, we avoid the re-computation.

At the same time, GHC does have an optimization pass to float expensive computations out of loops like this, so it may convert the first version to the second automatically. Your book says the second version 'guarantees' to calculate the value of cp xss only once because, if the expression is expensive to compute, compilers will generally be very hesitant to inline it (converting the second version back into the first).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM