简体   繁体   English

Apache Spark 中的执行器和内核

[英]Executors and cores in Apache Spark

I'm bit new to Spark and trying to understand few term.我对 Spark 有点陌生,并试图理解几个术语。 (Couldn't understand using online resources) (无法理解使用在线资源)

Please validate me first with below terms:请首先使用以下条款验证我:

Executor: Its container or JVM process which will be running on worker node or data node . Executor:它的container or JVM process ,将在worker node or data node上运行。 We can have multiple Executors per node.每个节点可以有多个 Executor。

Core: Its a thread within a container or JVM process running on worker node or data node .核心:它是在worker node or data node上运行的container or JVM process的线程。 We can have multiple cores or threads per executor.每个执行器可以有多个内核或线程。

Please correct me If am wrong in above two concepts.如果以上两个概念有误,请指正。

Questions:问题:

  1. When ever we submit spark job, What does it means ?当我们提交 spark job 时,这意味着什么? Are we handing over our job to Yarn or resource manager which will assigning resources to my application or job in cluster and execute that ?我们是否将我们的工作交给 Yarn 或资源管理器,后者会将资源分配给我的application or job集群中的application or job并执行它? Its it correct understanding .. ?它的正确理解..?
  2. In command used to submit job in spark cluster, there is an option to set number of executors.在用于在 Spark 集群中提交作业的命令中,有一个选项可以设置执行程序的数量。

    spark-submit --class <CLASS_NAME> --num-executors ? --executor-cores ? --executor-memory ? ....

So these number of executors + cores will be setting up per-node?那么每个节点将设置这些数量的执行器 + 内核吗? If not then how can we set specific number of cores per node?如果不是,那么我们如何设置每个节点的特定内核数?

All of your assumptions are correct.你所有的假设都是正确的。 For in a detailed explanation regarding cluster architecture please go through this link.有关集群架构的详细说明,请访问链接。 You'll get a clear picture.你会得到清晰的画面。 Regarding your second question, num-of-executors is for the entire cluster.关于你的第二个问题,num-of-executors 是针对整个集群的。 It is calculated as below:计算如下:

num-cores-per-node * total-nodes-in-cluster

For example, suppose that you have a 20-node cluster with 4-core machines, and you submit an application with -executor-memory 1G and --total-executor-cores 8. Then Spark will launch eight executors, each with 1 GB of RAM, on different machines.例如,假设您有一个 20 节点的集群和 4 核机器,并且您提交了一个带有 -executor-memory 1G 和 --total-executor-cores 8 的应用程序。那么 Spark 将启动 8 个 executor,每个 1GB RAM,在不同的机器上。 Spark does this by default to give applications a chance to achieve data locality for distributed filesystems running on the same machines (eg, HDFS) because these systems typically have data spread out across all nodes. Spark 默认这样做是为了让应用程序有机会为运行在同一台机器上的分布式文件系统(例如 HDFS)实现数据本地化,因为这些系统通常将数据分布在所有节点上。

I hope it helps!我希望它有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM