[英]can someone let me know how to decide --executor memory and --num-of-executors in spark submit job . What is the concept of -number-of-cores
How to decide the --executor memory and --num-of-executors in spark submit job . 如何确定Spark提交作业中的--executor内存和--num-of-executors。 What is the concept of -number-of-cores. -core-of-cores的概念是什么。
Also the clear difference between cluster and client deploy mode. 集群和客户端部署模式之间也存在明显区别。 How to choose the deploy mode 如何选择部署模式
The first part of your question where you ask about --executor-memory
, --num-executors
and --num-executor-cores
usually depends on the variety of task your Spark application is going to perform. 问题的第一部分询问--num-executors
--executor-memory
,-- --num-executors
和--num-executor-cores
通常取决于Spark应用程序要执行的任务种类。
tasks
. 每个执行器拆分代码并执行tasks
的指令。 These tasks are performed in executor cores (or processors). 这些任务在执行程序核心(或处理器)中执行。 This helps you to achieve parallelism within a certain executor but make sure you don't allocate all the cores of a machine to its executor because some are needed for normal functioning of it. 这可以帮助您在某个执行程序内实现并行性,但要确保没有将计算机的所有内核分配给其执行程序,因为正常运行需要一些内核。 On to your second part of the question, we have two --deploy-mode
in Spark that you have already named ie cluster
and client
. 在问题的第二部分,我们在Spark中有两个--deploy-mode
,您已经将其命名为cluster
和client
。
client
mode is when you connect an external machine to a cluster and you run a spark job from that external machine. client
模式是将外部计算机连接到群集并从该外部计算机运行Spark作业时的模式。 Like when you connect your laptop to a cluster and run spark-shell
from it. 就像将便携式计算机连接到群集并从中运行spark-shell
一样。 The driver JVM is invoked in your laptop and the session is killed as soon as you disconnect your laptop. 断开便携式计算机的连接后,便会在便携式计算机中调用驱动程序JVM,并终止会话。 Similar is the case for a spark-submit
job, if you run a job with --deploy-mode client
, your laptop acts like the master but the job is killed as soon as it is disconnected (not sure about this one). spark-submit
作业的情况与此类似,如果您使用--deploy-mode client
运行作业,则您的笔记本电脑将像主计算机一样工作,但是一旦断开连接,该作业将被终止(对此不确定)。 cluster
mode: When you specify --deploy-mode cluster
in your job then even if you run it using your laptop or any other machine, the job (JAR) is taken care of by the ResourceManager and ApplicationMaster, just like any other application in YARN. cluster
模式:当您在作业中指定--deploy-mode cluster
,即使您使用便携式计算机或任何其他计算机运行它,该作业(JAR)也会由ResourceManager和ApplicationMaster处理,就像纱。 You won't be able to see the output on your screen but anyway most complex Spark jobs write to a FS so that's taken care of that way. 您将无法在屏幕上看到输出,但是无论如何,大多数复杂的Spark作业都将写入FS,因此可以解决这种情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.