简体   繁体   English

spark的spark调度模式和应用程序队列之间有什么区别?

[英]what is the difference between spark scheduling mode and application queue in spark?

While testing the behavior of spark jobs when multiple jobs are submitted to run concurrently or smaller jobs submitted later. 在测试Spark作业的行为时,提交多个作业以同时运行或稍后提交较小的作业。 i came across two settings in spark ui. 我在Spark ui中遇到了两种设置。 one is scheduling mode available withing spark as shown in below image 一种是带有火花的调度模式,如下图所示

火花中的调度模式

And one is under scheduler as show below 并在调度程序下,如下所示 在此处输入图片说明

I want to understand the difference between two settings and preemption. 我想了解两个设置和抢占之间的区别。 My Requirement is that while running the bigger job, small jobs submitted in between must get the resources without waiting longer. 我的要求是,在运行较大的作业时,介于两者之间提交的小型作业必须获得资源,而不必等待更长的时间。

Let me explain it for the Spark On Yarn mode. 让我解释一下“ Spark On Yarn”模式。

When you submit a scala code to spark, spark client will interact with yarn and launch a yarn application. 当您提交scala代码以触发spark时,spark客户端将与yarn交互并启动yarn应用程序。 This application will be duty on all the jobs in your scala code. 此应用程序将对您的Scala代码中的所有任务负责。 In most cases, each job correspond to an Spark Action like reduce()、collect().Then ,the problem comes, how to schedule different jobs in this application, for example, in your application , there a 3 concurrent jobs comes out and waiting for execution? 在大多数情况下,每个作业都对应一个Spark动作,例如reduce(),collect()。然后,问题来了,如何在此应用程序中安排不同的作业,例如,在您的应用程序中,出现了3个并发作业,等待执行? To deal with it , Spark make the scheduler rule for job, including FIFO and Fair.That is to say , spark scheduler ,including FIFO and Fair, is on the level of job , and it is the spark ApplicationMaster which is do the scheduling work. 为了解决这个问题,Spark制定了包括FIFO和Fair在内的作业的调度程序规则,也就是说,包括FIFO和Fair在内的spark调度程序处于作业级别 ,而Spark ApplicationMaster负责进行调度工作。 。

But yarn's scheduler, is on the level of Container .Yarn doesn't care what is running in this container, maybe the container it is a Mapper task , a Reducer task , a Spark Driver process or a Spark executor process and so on. 但是yarn的调度程序位于Container的级别上 .Yarn不在乎此容器中正在运行的内容,也许该容器是一个Mapper任务,一个Reducer任务,一个Spark Driver进程或一个Spark executor进程等等。 For example, your MapReduce job is currently asking for 10 container, each container need (10g memory and 2 vcores), and your spark application is currently asking for 4 container ,each container need (10g memory and 2 vcores). 例如,您的MapReduce作业当前要求10个容器,每个容器需要(10g内存和2个vcore),而您的spark应用程序当前要求4个容器,每个容器需要(10g内存和2个vcore)。 Yarn has to decide how many container are now available in the cluster and how much resouce should be allocated for each request by a rule ,this rule is yarn's scheduler, including FairScheduler and CapacityScheduler. 纱线必须根据规则决定群集中现在有多少个容器可用,以及应为每个请求分配多少资源,该规则是纱线的调度程序,包括FairScheduler和CapacityScheduler。

In general, your spark application ask for several container from yarn, yarn will decide how many container can be allocated for your spark application currently by its scheduler.After these container are allocated , Spark ApplicationMaster will decide how to distribute these container among its jobs. 通常,您的Spark应用程序从纱线中请求几个容器,纱线将决定其调度程序当前可为您的Spark应用程序分配多少个容器。分配这些容器后,Spark ApplicationMaster将决定如何在其作业之间分配这些容器。

Below is the official document about spark scheduler: https://spark.apache.org/docs/2.0.0-preview/job-scheduling.html#scheduling-within-an-application 以下是关于Spark Scheduler的正式文档: https : //spark.apache.org/docs/2.0.0-preview/job-scheduling.html#scheduling-within-an-application

I think Spark.scheduling.mode (Fair/FIFO), shown in the figure, is for scheduling tasksets (single-same stage tasks) submitted to the taskscheduler using a FAIR or FIFO policy etc.. These tasksets belong to the same job. 我认为,如图所示,Spark.scheduling.mode(Fair / FIFO)用于调度使用FAIR或FIFO策略等提交给任务计划程序的任务集(单阶段任务)。这些任务集属于同一作业。

To be able to run jobs concurrently, execute each job (transformations + action) in a separate thread. 为了能够同时运行作业,请在单独的线程中执行每个作业(转换+动作)。 When a job is submitted to the DAG the main thread is blocked until job completes and result is returned or saved. 将作业提交到DAG后,主线程将被阻塞,直到作业完成并且返回或保存结果为止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用spark-submit和java -cp运行spark应用程序时有什么区别? - What is the difference between when I run a spark application using spark-submit and java -cp? SPARK Partitions 和 Worker Cores 有什么区别? - What is the difference between SPARK Partitions and Worker Cores? Spark序列化和Java序列化有什么区别? - What is the difference between Spark Serialization and Java Serialization? 在 spark 中设置 conf spark.default.parallelism 和调用方法 rdd.coalesce() 有什么区别? - In spark what is the difference between setting the conf spark.default.parallelism and calling the method rdd.coalesce()? Spark调度/架构混乱 - Spark scheduling / architecture confusion 计划Spark Job Java - Scheduling a Spark Job Java Spark toLocalIterator和迭代器方法之间的区别 - Difference between Spark toLocalIterator and iterator methods Apache Spark中RowMatrix和Matrix之间的区别? - Difference between RowMatrix and Matrix in Apache Spark? Java Spark中`:path-param`和`{path-param}`有什么区别? - What's the difference between `:path-param` and `{path-param}` in Java Spark? 在Apache spark中,使用mapPartitions和组合使用广播变量和map之间的区别是什么 - In Apache spark, what is the difference between using mapPartitions and combine use of broadcast variable and map
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM