Let's say you have a program which manipulates a stream Stream[Foo]
in some manner to produce a computation of interest, eg
myFooStream.map(toBar).groupBy(identity).mapValues(_.size)
Lovely, except now you've got to do some other kind of computation on myFooStream
like
myFooStream.map(toBar).sum
And you'd like to compose these computations somehow so that you do not need to iterate twice over the stream (let's say that iterating over the stream is expensive for some reason).
Is there some Scala-ish way of dealing with this problem? My problem, put more abstractly, is that I'd like to somehow abstract computation over these streams from the iteration over these streams. That is, what be best is if I could somehow write two methods f: Stream[Foo] => Bar
and g: Stream[Foo] => Baz
and somehow compose f
and g
in a way such that they operated on a single iteration of the stream.
Is there some abstraction which allows this?
UPDATED QUESTION: I've done a little digging around. Would scalaz arrows be helpful with this problem?
Streams
naturally try to avoid generating their elements multiple times if possible, by memoizing results. From the docs :
The
Stream
class also employs memoization such that previously computed values are converted from Stream elements to concrete values of typeA
.
We can see that by construction a Stream
that prints every time an element is produced, and running multiple operations:
val stream = Stream.from(0).map(x => { println(x); x }).take(10) //prints 0
val double = stream.map(_ * 2).take(5).toList //prints 1 through 4
val sum = stream.sum //prints 5 through 9
val sum2 = stream.sum //doesn't print any more
This works as long as you use a val
and not a def
:
So long as something is holding on to the head, the head holds on to the tail, and so it continues recursively. If, on the other hand, there is nothing holding on to the head (eg we used
def
to define theStream
) then once it is no longer being used directly, it disappears.
This memoization means one must be cautious with Streams
:
One must be cautious of memoization; you can very quickly eat up large amounts of memory if you're not careful. The reason for this is that the memoization of the
Stream
creates a structure much likescala.collection.immutable.List
.
Of course, if the generating of the items isn't what is expensive, but the actual traversal of the Stream
, or memoization isn't available because it would be too expensive, one can always use foldLeft
with a tuple, keeping track of multiple values:
//Only prints 0-9 once, even if stream is a def
val (sum, double) = stream.foldLeft(0 -> List.empty[Int]) {
case ((sum, list), next) => (sum + next, list :+ (next * 2))
}
If this is a common enough operation, you might even enrich Stream
to make some of the more common operations like foldLeft
, reduceLeft
, and others available in this format:
implicit class RichStream[T](val stream: Stream[T]) extends AnyVal {
def doubleFoldLeft[A, B](start1: A, start2: B)(f: (A, T) => A, g: (B, T) => B) = stream.foldLeft(start1 -> start2) {
case ((aAcc, bAcc), next) => (f(aAcc, next), g(bAcc, next))
}
}
Which would allow you to do things like:
val (sum, double) = stream.doubleFoldLeft(0, List.empty[Int])(_ + _, _ :+ _)
The stream will not iterate twice:
Stream.continually{println("bob"); 1}.take(4).map(v => v).sum
bob
bob
bob
bob
4
and
val bobs = Stream.continually{println("bob"); 1}.take(4)
val alices = Stream.continually{println("alice"); 2}.take(4)
bobs.zip(alices).map{ case (b, a) => a + b}.sum
bob
bob
bob
bob
alice
alice
alice
alice
12
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.