[英]How to execute multiple functions in parallel in spark using scala?
How to execute multiple functions in parallel in spark batch using scala?如何使用scala在spark批处理中并行执行多个函数?
def main(args: Array[String]) {
def func1() {
// dataframe 1 write to oracle database table 1
}
def func2() {
// dataframe 2 write to oracle database table 2
}
def func3() {
// dataframe 3 write to oracle database table 3
}
}
In general concurrency can be achieved using Futures... following the example below you can try on your own...一般来说,可以使用 Futures 来实现并发......按照下面的例子你可以自己尝试......
see Concurrency in Spark查看Spark 中的并发
/** A singleton object that controls the parallelism on a Single Executor JVM, Using the GlobalContext **/
object ConcurrentContext {
import scala.util._
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
/** Wraps a code block in a Future and returns the future */
def executeAsync[T](f: => T): Future[T] = {
Future(f)
}
}
and then进而
scala> sc.parallelize( 1 to 10).map(fastFoo).map(x => ConcurrentContext.executeAsync(slowFoo(x))).collect
fastFoo(1)
fastFoo(2)
fastFoo(3)
fastFoo(4)
slowFoo start (2)
slowFoo start (1)
fastFoo(5)
slowFoo start (3)
...
res6: Array[scala.concurrent.Future[Int]] = Array(List(), List(), List(), List(), List(), List(), List(), List(), List(), List())
scala> // Our request returns
//Then 5 seconds later
slowFoo end(1)
slowFoo end(7)
slowFoo end(8)
slowFoo end(4)
slowFoo start (10)
slowFoo end(5)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.