为什么使用EMR上的自定义设置减少火花执行器的数量

Question

I'm running spark 1.6, cluster mode on EMR 4.3.0 with the following settings: 我正在EMR 4.3.0上使用以下设置运行spark 1.6，集群模式：

 [
  {
    "classification": "spark-defaults",
    "properties": {
      "spark.executor.cores" : "16"
    }
  },
  {
    "classification": "spark",
    "properties": {
      "maximizeResourceAllocation": "true"
    }
  }
]

With the following instances: 对于以下情况：

master: 1 * m3.xlarge
core: 2 * m3.xlarge

When I test the number of executors with: 当我通过以下方式测试执行程序的数量时：

val numExecutors = sc.getExecutorStorageStatus.size - 1

I only get 2 . 我只得到2 。

Are somehow the EMR settings for spark overwritten? 火花的EMR设置是否被覆盖？

Answer 1

Ok, here is the problem : you are settings the number of cores for each executor and not the number of executors. 好的，这是问题所在：您正在设置每个执行程序的核心数量，而不是执行程序的数量。 eg "spark.executor.cores" : "16" . 例如"spark.executor.cores" : "16" 。

And since you are on AWS EMR, this means also that you are using YARN . 而且，由于您使用的是AWS EMR，因此这也意味着您正在使用YARN 。

By default, the number of executor instances is 2 ( spark.executor.instances is the property that defines the number of executors). 默认情况下，执行程序实例的数量为2（ spark.executor.instances是定义执行程序数量的属性）。

Note : 注意：

This property is incompatible with spark.dynamicAllocation.enabled . 该属性与spark.dynamicAllocation.enabled不兼容。 If both spark.dynamicAllocation.enabled and spark.executor.instances are specified, dynamic allocation is turned off and the specified number of spark.executor.instances is used. 如果同时指定了spark.dynamicAllocation.enabled和spark.executor.instances ，则动态分配将关闭，并使用指定数量的spark.executor.instances 。
Fewer cores means more executors in general, but in this case you'll have to manage the numbers of cores with yarn since YARN will manage the cluster for you and since by default YARN is using 1 core per executor. 更少的核心通常意味着更多的执行者，但是在这种情况下，您将不得不使用yarn管理核心的数量，因为YARN将为您管理集群，并且由于YARN默认每个执行者使用1个核心。

Thus you get the following : 这样您将获得以下内容：

scala> val numExecutors = sc.getExecutorStorageStatus.size - 1
res1 : numberExectuors : Int = 2

This means that you are actually using two executors, one per slave that is only operating on 1 core. 这意味着您实际上使用的是两个执行程序，每个从属程序一个仅在1个内核上运行的执行程序。

为什么使用EMR上的自定义设置减少火花执行器的数量

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-04-05 13:57:41

为什么使用EMR上的自定义设置减少火花执行器的数量

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-04-05 13:57:41

解决方案1
3 已采纳 2016-04-05 13:57:41