如何使用 EMR 集群增加 Spark 应用程序的运行容器数量？

Question

I am using EMR cluster of 1 master and 11 m5.2xlarge core nodes.我正在使用 1 个主节点和 11 个 m5.2xlarge 核心节点的 EMR 集群。 After doing some related calculations to this type of node, the following json to set my spark application configuration on EMR:对这种类型的节点做了一些相关的计算后，下面的json在EMR上设置我的spark应用程序配置：

[
    {
        "Classification": "capacity-scheduler",
        "Properties": {
            "yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
        }
    },

 {
        "Classification": "yarn-site",
        "Properties": {
           "yarn.nodemanager.vmem-check-enabled":"false",
           "yarn.nodemanager.pmem-check-enabled":"false"
                 }
               },
    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.dynamicAllocation.enabled":"false",
        "spark.worker.instances":"5",
            "spark.driver.memory":"20g",
            "spark.executor.memory":"20g",
            "spark.executor.cores":"5",
            "spark.driver.cores":"5",
            "spark.executor.instances":"14",
            "spark.yarn.executor.memoryOverhead":"4g",
            "spark.default.parallelism":"140"
        }
    },
  {
    "classification": "spark",
    "properties": {
      "maximizeResourceAllocation":"false"
    }
  }
]

However, the running containers of this cluster are not as i expected (usually the same number of running cores).然而，这个集群的运行容器并不像我预期的那样（通常是相同数量的运行内核）。 Just 11 running contaiers there are, how can i increase this number to be 51 as the number of used Vcores?只有 11 个正在运行的容器，我如何将这个数字增加到 51 作为使用的 Vcores 的数量？

Answer 1

The instance type m5.2xlarge has 8 vCPUs and 32G RAM.实例类型 m5.2xlarge有 8 个 vCPU 和 32G RAM。 You could do 4 executors per node at 2 vCPUs and 7G per executor, for a total of 44 executors.您可以在每个节点上执行 4 个执行程序，每个执行程序有 2 个 vCPU 和 7G，总共 44 个执行程序。 This would leave you 4G overhead on each worker node, which should be plenty.这会在每个工作节点上留下 4G 开销，这应该足够了。

Your spark-defaults config should be thus:你的spark-defaults配置应该是这样的：

    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.dynamicAllocation.enabled":"false",

            "spark.executor.instances":"44",
            "spark.executor.cores":"2",
            "spark.executor.memory":"7g"
        }
    },

如何使用 EMR 集群增加 Spark 应用程序的运行容器数量？

问题描述

1 个解决方案

解决方案1
2 2020-02-26 00:43:36

如何使用 EMR 集群增加 Spark 应用程序的运行容器数量？

问题描述

1 个解决方案

解决方案1 2 2020-02-26 00:43:36

解决方案1
2 2020-02-26 00:43:36