简体   繁体   English

如何在Spark中一次应用多种方法?

[英]How to apply multiple methods at a time in Spark?

df is a dataframe contains all car data(| id | time | speed | gps |...|); df是一个包含所有汽车数据的数据帧(| id |时间|速度| gps | ... |);

trips is a series list contains(id,start,end) which generate from df. trips是一个由df生成的包含(id,start,end)的系列列表。

method1 is used to get each id's stats information. method1用于获取每个ID的统计信息。 method2 is used to get each id's other stats information. method2用于获取每个ID的其他统计信息。

Like this code: 像这样的代码:

val a = method1(trips,df,sc)
val b = method2(trips,df,sc)
val c = method3(trips,df,sc)
val d = method4(trips,df,sc)
val e = method5(trips,df,sc)
val f = method6(trips,df,sc)

Because each method take a certain time, is there any way to apply the methods for assignments at the same time? 由于每种方法都需要一定的时间,是否有任何方法可以同时将这些方法应用于分配? The type of a,b...,f is dataframe. a,b ...,f的类型是数据帧。

yes you can run multiple jobs at a same time in spark cluster with the help of asynchronous actions like collectAsync(),countAsync() etc. 是的,您可以借助collectAsync(),countAsync()等异步操作同时在Spark集群中同时运行多个作业。

 yo just set configuration with context .set("spark.scheduler.mode", "FAIR") 

And use asynchronous actions and so all jobs runs asynchronously and its return future so your methods also return future so all methods run at a time. 并且使用异步操作,因此所有作业都是异步运行的,并且它的返回将来,因此您的方法也将返回将来,因此所有方法都一次运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Spark中加入多个数据框时如何应用Like操作? - How to Apply Like operation while joining multiple data frame in spark? Scala:如何制作多种方法并一一应用? - Scala: How to make multiple methods and apply them one by one? 将UDF应用于Spark Dataframe中的多个列 - Apply UDF to multiple columns in Spark Dataframe 如何模拟Spark SqlContext的方法? - How to mock methods of Spark SqlContext? 如何减少Spark中的多个小文件加载时间 - how to reduce multiple small file load time in spark 在 spark 中自加入并在 spark 中应用多个过滤条件 Scala - Self join in spark and apply multiple filter criteria in spark Scala 如何将具有多个参数的自定义 function 应用于 dataframe 的每组并将生成的数据帧合并到 Scala Spark 中? - How to apply a customized function with multiple parameters to each group of a dataframe and union the resulting dataframes in Scala Spark? 当我们在联接列中有多个值时,如何在Spark Scala中应用联接 - how to apply joins in spark scala when we have multiple values in the join column 如何将 regexp_replace spark function 应用于多个键值? - How apply regexp_replace spark function for multiple key-values? 我正在将相同的方法应用于 spark scala 中的多个数据帧,我该如何并行化? - I'm applying the same methods to multiple dataframes in spark scala, how can I parallelize this?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM