简体   繁体   English

如何使用scala在spark中并行执行多个函数?

[英]How to execute multiple functions in parallel in spark using scala?

How to execute multiple functions in parallel in spark batch using scala?如何使用scala在spark批处理中并行执行多个函数?

 def main(args: Array[String]) {
 def func1() {
 // dataframe 1 write to oracle database table 1
 }
 def func2() {
 // dataframe 2 write to oracle database table 2
 }
 def func3() { 
 // dataframe 3 write to oracle database table 3
 }
}

In general concurrency can be achieved using Futures... following the example below you can try on your own...一般来说,可以使用 Futures 来实现并发......按照下面的例子你可以自己尝试......

see Concurrency in Spark查看Spark 中的并发

/** A singleton object that controls the parallelism on a Single Executor JVM, Using the GlobalContext **/
object ConcurrentContext {
  import scala.util._
  import scala.concurrent._
  import scala.concurrent.ExecutionContext.Implicits.global
  /** Wraps a code block in a Future and returns the future */
  def executeAsync[T](f: => T): Future[T] = {
    Future(f)
  }
}

and then进而

scala> sc.parallelize( 1 to 10).map(fastFoo).map(x => ConcurrentContext.executeAsync(slowFoo(x))).collect
fastFoo(1)
fastFoo(2)
fastFoo(3)
fastFoo(4)
slowFoo start (2)
slowFoo start (1)
fastFoo(5)
slowFoo start (3)
  ...
res6: Array[scala.concurrent.Future[Int]] = Array(List(), List(), List(), List(), List(), List(), List(), List(), List(), List())

scala>  // Our request returns
//Then 5 seconds later
slowFoo end(1)
slowFoo end(7)
slowFoo end(8)
slowFoo end(4)
slowFoo start (10)
slowFoo end(5)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM