如何在Spark中一次应用多种方法？

Question

df is a dataframe contains all car data(| id | time | speed | gps |...|); df是一个包含所有汽车数据的数据帧（| id |时间|速度| gps | ... |）；

trips is a series list contains(id,start,end) which generate from df. trips是一个由df生成的包含（id，start，end）的系列列表。

method1 is used to get each id's stats information. method1用于获取每个ID的统计信息。 method2 is used to get each id's other stats information. method2用于获取每个ID的其他统计信息。

Like this code: 像这样的代码：

val a = method1(trips,df,sc)
val b = method2(trips,df,sc)
val c = method3(trips,df,sc)
val d = method4(trips,df,sc)
val e = method5(trips,df,sc)
val f = method6(trips,df,sc)

Because each method take a certain time, is there any way to apply the methods for assignments at the same time? 由于每种方法都需要一定的时间，是否有任何方法可以同时将这些方法应用于分配？ The type of a,b...,f is dataframe. a，b ...，f的类型是数据帧。

Answer 1

yes you can run multiple jobs at a same time in spark cluster with the help of asynchronous actions like collectAsync(),countAsync() etc. 是的，您可以借助collectAsync（），countAsync（）等异步操作同时在Spark集群中同时运行多个作业。

 yo just set configuration with context .set("spark.scheduler.mode", "FAIR")

And use asynchronous actions and so all jobs runs asynchronously and its return future so your methods also return future so all methods run at a time. 并且使用异步操作，因此所有作业都是异步运行的，并且它的返回将来，因此您的方法也将返回将来，因此所有方法都一次运行。

如何在Spark中一次应用多种方法？

问题描述

1 个解决方案

解决方案1
0 2016-01-05 07:21:39

如何在Spark中一次应用多种方法？

问题描述

1 个解决方案

解决方案1 0 2016-01-05 07:21:39

解决方案1
0 2016-01-05 07:21:39