简体   繁体   中英

Haskell: Lists vs Streams

I've noticed streams seem to act a lot like lists, except with constant time append. Of course, adding constant time append to lists isn't too complicated, and DList does exactly that.

Lets assume for the rest of the discussion that either lists have constant time append, or that we're simply not interested in it.

My thought is that Haskell lists should simply be implemented as streams. For this not to be the case, I assume that the following would need to hold:

  1. There are cases where lists are better than streams AND
  2. There are cases where streams are better than lists.

My question is: what are examples of the two above cases?

Note: For the purpose of this question, please ignore easily fixable omissions in the particular implementations I've discussed. I'm looking more for core structural differences here.

Additional info:

I guess part of what I'm getting at here is say if we write [1..1000000] , does a Haskell compiler (say GHC) do:

  1. Make a list OR
  2. Make an object with two ints: 1 and 1000000 which fully describes the list.

If it's case (1), why do this, as creating intermediate lists seems to be an unnecessary performance penalty?

Or if it's case (2), then why do we need streams?

When you write [1..1000000] , what GHC really does is create an object that contains 1 and 1000000 that describes how to build the list of interest; that object is called a "thunk". The list is only built as necessary to satisfy case scrutinees; for example, you might write:

printList [] = putStrLn ""
printList (x:xs) = putStrLn (show x) >> printList xs

main = printList [1..1000000]

Which evaluates like this:

main
= { definition of main }
printList [1..1000000]
= { list syntax sugar }
printList (enumFromTo 1 1000000)
= { definition of printList }
case enumFromTo 1 1000000 of
    [] -> putStrLn ""
    x:xs -> putStrLn (show x) >> printList xs
= { we have a case, so must start evaluating enumFromTo;
    I'm going to skip a few steps here involving unfolding
    the definition of enumFromTo and doing some pattern
    matching }
case 1 : enumFromTo 2 1000000 of
    [] -> putStrLn ""
    x:xs -> putStrLn (show x) >> printList xs
= { now we know which pattern to choose }
putStrLn (show 1) >> printList (enumFromTo 2 1000000)

Then you'd find that 1 was printed to the console, and we'd start from near the top with enumFromTo 2 1000000 instead of enumFromTo 1 1000000 . Eventually, you'd get all the numbers printed and it would come time to evaluate

printList (enumFromTo 1000000 1000000)
= { definition of printList }
case enumFromTo 1000000 1000000 of
    [] -> putStrLn ""
    x:xs -> putStrLn (show x) >> printList xs
= { skipping steps again to evaluate enumFromTo }
case [] of
    [] -> putStrLn ""
    x:xs -> putStrLn (show x) >> printList xs
= { now we know which pattern to pick }
putStrLn ""

and evaluation would be finished.

The reason we need streams is a bit subtle. The original paper, Stream fusion: From lists to streams to nothing at all , probably has the most complete explanation. The short version is that when you have a long pipeline:

concatMap foo . map bar . filter pred . break isSpecial

...it's not so obvious how to get the compiler to compile away all the intermediate lists. You might notice that we can think of the lists as having a sort of "state" that's being iterated, and that each of these functions, rather than traversing a list, is just changing the way the state gets modified on each iteration. The Stream type tries to make this explicit, and the result is stream fusion. Here's how it looks: we first convert all these functions into stream versions:

(toList . S.concatMap foo . fromList) .
(toList . S.map bar . fromList) .
(toList . S.filter pred . fromList) .
(toList . S.break isSpecial . fromList)

then observe that we can always annihilate fromList . toList fromList . toList :

toList . S.concatMap foo . S.map bar . S.filter pred . S.break . fromList

...and then the magic happens because the chain S.concatMap foo . S.map bar . S.filter pred . S.break S.concatMap foo . S.map bar . S.filter pred . S.break S.concatMap foo . S.map bar . S.filter pred . S.break builds up an iterator explicitly rather than building it implicitly by internally building and then immediately annihilating actual lists.

The advantage of streams is they are more powerful. The interface:

data Stream m a = forall s . Stream (s -> m (Step s a)) s Size   

lets you do many things that normal lists cannot. Eg:

  • Track the size (eg Unknown, Max 34, Exact 12)
  • Perform monadic actions to get the next element. Lists can partly do this with lazy IO, but that technique has proved to be error prone, and normally is only used by beginners, or for simple small scripts.

However, they have a big downside as compared to lists - complexity! For a beginner programmer, to understand streams you have to be on top of existential types and monadic actions. It would be very hard to learn haskell if to use the basic list type you had to learn those two complex subjects.

Compare that to lists, which have the interface:

data [] a = a : [a] | []

This is very simple, and something that can be taught easily to a new programmer.

Another advantage of lists is you can pattern match them simply. For example:

getTwo (a : b : _) = Just (a,b)
getTwo _ = Nothing

This is both useful to experienced programmers (I still use list pattern matching in many methods), and for beginner programmers who haven't yet learnt the standard higher order functions that can be used to manipulate lists.

Efficiency is also another potential advantage of lists, since ghc has spent a lot of time working on list fusion. In a lot of code, intermediate lists are never generated. That could be a lot harder to optimize with streams.

So I think it would be a poor choice to swap lists with Streams. The current situation is better, where you can bring them in if you need them, but beginners aren't stuck with their complexity and skilled users don't have to lose pattern matching.

EDIT: about [1..1000000] :

This is equivalent to enumFromTo 1 1000000 , which is lazily evaluated, and subject to fusion (which makes it very efficient). Eg sum [1..1000000] would not generate any lists (and use constant memory) with optimisation turned on. So case (2) is correct, this situation isn't an advantage for streams due to lazy evaluation. As noted above though, streams have other advantages over lists.

Short answer: lists and streams are incomparable in power. Streams allow monadic actions but disallow sharing while lists are vice versa.

A longer answer:

1) See @nanothief for a counterexample which cannot be implemented with lists 2) Below is a counterexample which cannot be easily implemented with streams

The problem is that toy list examples usually don't use the sharing feature of lists. Here is the code:

foo = map heavyFunction bar
baz = take 5 foo
quux = product foo

With lists you compute the heavy function only once. The code to compute baz and quux with streams without extra computations of heavyFunction is going to be hard to maintain.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM