简体   繁体   中英

Composing Operations on Streams in Scala

Let's say you have a program which manipulates a stream Stream[Foo] in some manner to produce a computation of interest, eg

myFooStream.map(toBar).groupBy(identity).mapValues(_.size)

Lovely, except now you've got to do some other kind of computation on myFooStream like

myFooStream.map(toBar).sum

And you'd like to compose these computations somehow so that you do not need to iterate twice over the stream (let's say that iterating over the stream is expensive for some reason).

Is there some Scala-ish way of dealing with this problem? My problem, put more abstractly, is that I'd like to somehow abstract computation over these streams from the iteration over these streams. That is, what be best is if I could somehow write two methods f: Stream[Foo] => Bar and g: Stream[Foo] => Baz and somehow compose f and g in a way such that they operated on a single iteration of the stream.

Is there some abstraction which allows this?

UPDATED QUESTION: I've done a little digging around. Would scalaz arrows be helpful with this problem?

Streams naturally try to avoid generating their elements multiple times if possible, by memoizing results. From the docs :

The Stream class also employs memoization such that previously computed values are converted from Stream elements to concrete values of type A .

We can see that by construction a Stream that prints every time an element is produced, and running multiple operations:

val stream = Stream.from(0).map(x => { println(x); x }).take(10) //prints 0
val double = stream.map(_ * 2).take(5).toList //prints 1 through 4
val sum = stream.sum //prints 5 through 9
val sum2 = stream.sum //doesn't print any more

This works as long as you use a val and not a def :

So long as something is holding on to the head, the head holds on to the tail, and so it continues recursively. If, on the other hand, there is nothing holding on to the head (eg we used def to define the Stream ) then once it is no longer being used directly, it disappears.

This memoization means one must be cautious with Streams :

One must be cautious of memoization; you can very quickly eat up large amounts of memory if you're not careful. The reason for this is that the memoization of the Stream creates a structure much like scala.collection.immutable.List .

Of course, if the generating of the items isn't what is expensive, but the actual traversal of the Stream , or memoization isn't available because it would be too expensive, one can always use foldLeft with a tuple, keeping track of multiple values:

//Only prints 0-9 once, even if stream is a def
val (sum, double) = stream.foldLeft(0 -> List.empty[Int]) { 
    case ((sum, list), next) => (sum + next, list :+ (next * 2)) 
}

If this is a common enough operation, you might even enrich Stream to make some of the more common operations like foldLeft , reduceLeft , and others available in this format:

implicit class RichStream[T](val stream: Stream[T]) extends AnyVal {
    def doubleFoldLeft[A, B](start1: A, start2: B)(f: (A, T) => A, g: (B, T) => B) = stream.foldLeft(start1 -> start2) { 
        case ((aAcc, bAcc), next) => (f(aAcc, next), g(bAcc, next)) 
    }
}

Which would allow you to do things like:

val (sum, double) = stream.doubleFoldLeft(0, List.empty[Int])(_ + _, _ :+ _)

The stream will not iterate twice:

Stream.continually{println("bob"); 1}.take(4).map(v => v).sum
bob
bob
bob
bob
4

and

val bobs = Stream.continually{println("bob"); 1}.take(4)
val alices = Stream.continually{println("alice"); 2}.take(4)
bobs.zip(alices).map{ case (b, a) => a + b}.sum
bob
bob
bob
bob
alice
alice
alice
alice
12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM