简体   繁体   English

有人可以让我知道如何确定Spark提交作业中的--executor内存和--num-of-executors吗? -core-of-cores的概念是什么

[英]can someone let me know how to decide --executor memory and --num-of-executors in spark submit job . What is the concept of -number-of-cores

How to decide the --executor memory and --num-of-executors in spark submit job . 如何确定Spark提交作业中的--executor内存和--num-of-executors。 What is the concept of -number-of-cores. -core-of-cores的概念是什么。

Also the clear difference between cluster and client deploy mode. 集群和客户端部署模式之间也存在明显区别。 How to choose the deploy mode 如何选择部署模式

The first part of your question where you ask about --executor-memory , --num-executors and --num-executor-cores usually depends on the variety of task your Spark application is going to perform. 问题的第一部分询问--num-executors --executor-memory ,-- --num-executors--num-executor-cores通常取决于Spark应用程序要执行的任务种类。

  • Executor Memory indicates the amount of physical memory you want to allocate to the JVM that runs the executor. 执行程序内存指示您要分配给运行执行程序的JVM的物理内存量。 The value will depend on your requirement. 该值将取决于您的要求。 For example, if you're just going to parse a large text file you'll require much less memory than what you need for, say, Image Processing. 例如,如果您仅要解析大型文本文件,则所需的内存将比图像处理所需的内存少得多。
  • The number of executors variable is the number of Executor JVMs you want to spawn on your cluster. executors变量的数量是您要在集群上生成的Executor JVM的数量。 Again, it depends on a lot of factors like your cluster size, type of machines in the cluster etc. 同样,它取决于许多因素,例如群集大小,群集中的计算机类型等。
  • Each executor splits the code and performs the instructions in tasks . 每个执行器拆分代码并执行tasks的指令。 These tasks are performed in executor cores (or processors). 这些任务在执行程序核心(或处理器)中执行。 This helps you to achieve parallelism within a certain executor but make sure you don't allocate all the cores of a machine to its executor because some are needed for normal functioning of it. 这可以帮助您在某个执行程序内实现并行性,但要确保没有将计算机的所有内核分配给其执行程序,因为正常运行需要一些内核。

On to your second part of the question, we have two --deploy-mode in Spark that you have already named ie cluster and client . 在问题的第二部分,我们在Spark中有两个--deploy-mode ,您已经将其命名为clusterclient

  • client mode is when you connect an external machine to a cluster and you run a spark job from that external machine. client模式是将外部计算机连接到群集并从该外部计算机运行Spark作业时的模式。 Like when you connect your laptop to a cluster and run spark-shell from it. 就像将便携式计算机连接到群集并从中运行spark-shell一样。 The driver JVM is invoked in your laptop and the session is killed as soon as you disconnect your laptop. 断开便携式计算机的连接后,便会在便携式计算机中调用驱动程序JVM,并终止会话。 Similar is the case for a spark-submit job, if you run a job with --deploy-mode client , your laptop acts like the master but the job is killed as soon as it is disconnected (not sure about this one). spark-submit作业的情况与此类似,如果您使用--deploy-mode client运行作业,则您的笔记本电脑将像主计算机一样工作,但是一旦断开连接,该作业将被终止(对此不确定)。
  • cluster mode: When you specify --deploy-mode cluster in your job then even if you run it using your laptop or any other machine, the job (JAR) is taken care of by the ResourceManager and ApplicationMaster, just like any other application in YARN. cluster模式:当您在作业中指定--deploy-mode cluster ,即使您使用便携式计算机或任何其他计算机运行它,该作业(JAR)也会由ResourceManager和ApplicationMaster处理,就像纱。 You won't be able to see the output on your screen but anyway most complex Spark jobs write to a FS so that's taken care of that way. 您将无法在屏幕上看到输出,但是无论如何,大多数复杂的Spark作业都将写入FS,因此可以解决这种情况。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 什么是在Spark中计算--executor-memory --num-executors --executor-cores的最佳方法 - what is best way to calculate --executor-memory --num-executors --executor-cores in spark 在 spark-submit 工作中应该选择多少执行器和核心? - How much should one choose the number of executors and cores on a spark-submit job? 如何计算 Spark 作业中的内核数、执行器数、内存量 - How to calculate No of cores,executors, amount of memory in Spark Job 如何确定火花工作中执行者的确切数量?他们的任何公式是什么? - How to decide the exact number of executors in spark job?Is their any formula for that? 如何调整spark执行器编号,内核和执行程序内存? - How to tune spark executor number, cores and executor memory? 如果我们减少每个执行程序的内核数量并增加执行程序的数量,spark 如何管理 IO 性能 - How spark manages IO perfomnce if we reduce the number of cores per executor and incease number of executors YARN:Spark中执行者数量和执行者核心之间有什么区别? - YARN: What is the difference between number-of-executors and executor-cores in Spark? 如果spark.executor.instances和spark.cores.max不起作用,如何在Spark Standalone模式下增加执行程序的数量 - How to increase the number of executors in Spark Standalone mode if spark.executor.instances and spark.cores.max aren't working 如何决定spark中10亿行的执行者数量 - how to decide number of executors for 1 billion rows in spark Spark - 为我的spark作业分配了多少个执行器和内核 - Spark - How many Executors and Cores are allocated to my spark job
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM