[英]How to apply multiple methods at a time in Spark?
df is a dataframe contains all car data(| id | time | speed | gps |...|); df是一个包含所有汽车数据的数据帧(| id |时间|速度| gps | ... |);
trips is a series list contains(id,start,end) which generate from df. trips是一个由df生成的包含(id,start,end)的系列列表。
method1 is used to get each id's stats information. method1用于获取每个ID的统计信息。 method2 is used to get each id's other stats information.
method2用于获取每个ID的其他统计信息。
Like this code: 像这样的代码:
val a = method1(trips,df,sc)
val b = method2(trips,df,sc)
val c = method3(trips,df,sc)
val d = method4(trips,df,sc)
val e = method5(trips,df,sc)
val f = method6(trips,df,sc)
Because each method take a certain time, is there any way to apply the methods for assignments at the same time? 由于每种方法都需要一定的时间,是否有任何方法可以同时将这些方法应用于分配? The type of a,b...,f is dataframe.
a,b ...,f的类型是数据帧。
yes you can run multiple jobs at a same time in spark cluster with the help of asynchronous actions like collectAsync(),countAsync() etc. 是的,您可以借助collectAsync(),countAsync()等异步操作同时在Spark集群中同时运行多个作业。
yo just set configuration with context .set("spark.scheduler.mode", "FAIR")
And use asynchronous actions and so all jobs runs asynchronously and its return future so your methods also return future so all methods run at a time. 并且使用异步操作,因此所有作业都是异步运行的,并且它的返回将来,因此您的方法也将返回将来,因此所有方法都一次运行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.