简体   繁体   English

scala中的并发map / foreach

[英]Concurrent map/foreach in scala

I have an iteration vals: Iterable[T] and a long-running function without any relevant side effects: f: (T => Unit) . 我有一个迭代值vals: Iterable[T]和一个长期运行的函数,没有任何相关的副作用: f: (T => Unit) Right now this is applied to vals in the obvious way: 现在,这以明显的方式应用于vals

vals.foreach(f)

I would like the calls to f to be done concurrently (within reasonable limits). 我希望同时完成对f的调用(在合理的限制内)。 Is there an obvious function somewhere in the Scala base library? Scala基础库中某处有明显的功能吗? Something like: 就像是:

Concurrent.foreach(8 /* Number of threads. */)(vals, f)

While f is reasonably long running, it is short enough that I don't want the overhead of invoking a thread for each call, so I am looking for something based on a thread pool. 虽然f运行时间相当长,但它足够短,我不希望为每个调用调用一个线程的开销,所以我正在寻找基于线程池的东西。

Many of the answers from 2009 still use the old scala.actors.Futures._, which are no longer in the newer Scala. 2009年的许多答案仍然使用旧的scala.actors.Futures._,它们不再是新的Scala。 While Akka is the preferred way, a much more readable way is to just use parallel ( .par ) collections: 虽然Akka是首选方式,但更易读的方法是使用并行( .par )集合:

vals.foreach { v => f(v) }

becomes

vals.par.foreach { v => f(v) }

Alternatively, using parMap can appear more succinct though with the caveat that you need to remember to import the usual Scalaz*. 或者,使用parMap可能看起来更简洁,但需要记住要记住导入通常的Scalaz *。 As usual, there's more than one way to do the same thing in Scala! 像往常一样,Scala中有不止一种方法可以做同样的事情!

Scalaz has parMap . ScalazparMap You would use it as follows: 您可以按如下方式使用它:

import scalaz.Scalaz._
import scalaz.concurrent.Strategy.Naive

This will equip every functor (including Iterable ) with a parMap method, so you can just do: 这将为每个parMap函数(包括Iterable )配备parMap方法,因此您可以这样做:

vals.parMap(f)

You also get parFlatMap , parZipWith , etc. 您还可以获得parFlatMapparZipWith等。

I like the Futures answer. 我喜欢Futures答案。 However, while it will execute concurrently, it will also return asynchronously, which is probably not what you want. 但是,虽然它将同时执行,但它也将异步返回,这可能不是你想要的。 The correct approach would be as follows: 正确的方法如下:

import scala.actors.Futures._

vals map { x => future { f(x) } } foreach { _() }

I had some issues using scala.actors.Futures in Scala 2.8 (it was buggy when I checked). 我在Scala 2.8中使用scala.actors.Futures时遇到了一些问题(当我检查时它出了问题)。 Using java libs directly worked for me, though: 使用java libs直接为我工作,但是:

final object Parallel {
  val cpus=java.lang.Runtime.getRuntime().availableProcessors
  import java.util.{Timer,TimerTask}
  def afterDelay(ms: Long)(op: =>Unit) = new Timer().schedule(new TimerTask {override def run = op},ms)
  def repeat(n: Int,f: Int=>Unit) = {
    import java.util.concurrent._
    val e=Executors.newCachedThreadPool //newFixedThreadPool(cpus+1)
    (0 until n).foreach(i=>e.execute(new Runnable {def run = f(i)}))
    e.shutdown
    e.awaitTermination(Math.MAX_LONG, TimeUnit.SECONDS)
  }
}

我使用scala.actors.Futures

vals.foreach(t => scala.actors.Futures.future(f(t)))

The latest release of Functional Java has some higher-order concurrency features that you can use. 最新版本的Functional Java具有一些您可以使用的高阶并发功能。

import fjs.F._
import fj.control.parallel.Strategy._
import fj.control.parallel.ParModule._
import java.util.concurrent.Executors._

val pool = newCachedThreadPool
val par = parModule(executorStrategy[Unit](pool))

And then... 然后...

par.parMap(vals, f)

Remember to shutdown the pool . 记得shutdown pool

You can use the Parallel Collections from the Scala standard library. 您可以使用Scala标准库中的Parallel Collections They're just like ordinary collections, but their operations run in parallel. 它们就像普通的收藏品一样,但它们的操作并行运行。 You just need to put a par call before you invoke some collections operation. 您只需要在调用某些集合操作之前进行par调用。

import scala.collection._

val array = new Array[String](10000)
for (i <- (0 until 10000).par) array(i) = i.toString

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM