简体   繁体   中英

Scala: Thread safe mutable lazy Iterator with append

For an immutable flavour, Iterator does the job.

val x = Iterator.fill(100000)(someFn)

Now I want to implement a mutable version of Iterator , with three guarantees:

  • thread-safe on all transformations( fold , foldLeft , ..) and append
  • lazy evaluated
  • traversable only once! Once used, an object from this Iterator should be destroyed.

Is there an existing implementation to give me these guarantees? Any library or framework example would be great.

Update

To illustrate the desired behaviour.

class SomeThing {}
class Test(val list: Iterator[SomeThing]) {
   def add(thing: SomeThing): Test = {
      new Test(list ++ Iterator(thing))
   }
}
(new Test()).add(new SomeThing).add(new SomeThing);

In this example, SomeThing is an expensive construct, it needs to be lazy.

Re-iterating over list is never required, Iterator is a good fit.

This is supposed to asynchronously and lazily sequence 10 million SomeThing instances without depleting the executor(a cached thread pool executor) or running out of memory.

You don't need a mutable Iterator for this, just daisy-chain the immutable form:

class SomeThing {}

case class Test(val list: Iterator[SomeThing]) {
  def add(thing: => SomeThing) = Test(list ++ Iterator(thing))
}

(new Test()).add(new SomeThing).add(new SomeThing)

Although you don't really need the extra boilerplate of Test here:

Iterator(new SomeThing) ++ Iterator(new SomeThing)

Note that Iterator.++ takes a by-name param, so the ++ operation is already lazy.

You might also want to try this, to avoid building intermediate Iterators:

Iterator.continually(new SomeThing) take 2

UPDATE

If you don't know the size in advance, then I'll often use a tactic like this:

def mkSomething = if(cond) Some(new Something) else None
Iterator.continually(mkSomething) takeWhile (_.isDefined) map { _.get }

The trick is to have your generator function wrap its output in an Option, which then gives you a way to flag that the iteration is finished by returning None

Of course... If you're really pushing out the boat, you can even use the dreaded null :

def mkSomething = if(cond) { new Something } else null
Iterator.continually(mkSomething) takeWhile (_ != null)

Seems like you need to hide the fact that the iterator is mutable but at the same time allow it to grow mutably. What I'm going to propose is the same sort of trick I've used to speed up ::: in the past:

 abstract class AppendableIterator[A] extends Iterator[A]{
   protected var inner: Iterator[A]
   def hasNext = inner.hasNext
   def next() = inner next ()

   def append(that: Iterator[A]) = synchronized{
     inner = new JoinedIterator(inner, that)
   }
 }

 //You might need to add some more things, this is a skeleton
 class JoinedIterator[A](first: Iterator[A], second: Iterator[A]) extends Iterator[A]{
   def hasNext = first.hasNext || second.hasNext
   def next() = if(first.hasNext) first next () else if(second.hasNext) second next () else Iterator.next()
 }

So what you're really doing is leaving the Iterator at whatever place in its iteration you might have it while still preserving the thread safety of the append by "joining" another Iterator in non-destructively. You avoid the need to recompute the two together because you never actually force them through a CanBuildFrom .

This is also a generalization of just adding one item. You can always wrap some A in an Iterator[A] of one element if you so choose.

Have you looked at the mutable.ParIterable in the collection.parallel package?

To access an iterator over elements you can do something like

val x = ParIterable.fill(100000)(someFn).iterator

From the docs: Parallel operations are implemented with divide and conquer style algorithms that parallelize well. The basic idea is to split the collection into smaller parts until they are small enough to be operated on sequentially. Parallel operations are implemented with divide and conquer style algorithms that parallelize well. The basic idea is to split the collection into smaller parts until they are small enough to be operated on sequentially.

...

The higher-order functions passed to certain operations may contain side-effects. Since implementations of bulk operations may not be sequential, this means that side-effects may not be predictable and may produce data-races, deadlocks or invalidation of state if care is not taken. It is up to the programmer to either avoid using side-effects or to use some form of synchronization when accessing mutable data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM