简体   繁体   English

如何使用 EMR 集群增加 Spark 应用程序的运行容器数量?

[英]How to increase the number of running containers of spark application using EMR cluster?

I am using EMR cluster of 1 master and 11 m5.2xlarge core nodes.我正在使用 1 个主节点和 11 个 m5.2xlarge 核心节点的 EMR 集群。 After doing some related calculations to this type of node, the following json to set my spark application configuration on EMR:对这种类型的节点做了一些相关的计算后,下面的json在EMR上设置我的spark应用程序配置:

[
    {
        "Classification": "capacity-scheduler",
        "Properties": {
            "yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
        }
    },

 {
        "Classification": "yarn-site",
        "Properties": {
           "yarn.nodemanager.vmem-check-enabled":"false",
           "yarn.nodemanager.pmem-check-enabled":"false"
                 }
               },
    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.dynamicAllocation.enabled":"false",
        "spark.worker.instances":"5",
            "spark.driver.memory":"20g",
            "spark.executor.memory":"20g",
            "spark.executor.cores":"5",
            "spark.driver.cores":"5",
            "spark.executor.instances":"14",
            "spark.yarn.executor.memoryOverhead":"4g",
            "spark.default.parallelism":"140"
        }
    },
  {
    "classification": "spark",
    "properties": {
      "maximizeResourceAllocation":"false"
    }
  }
]

However, the running containers of this cluster are not as i expected (usually the same number of running cores).然而,这个集群的运行容器并不像我预期的那样(通常是相同数量的运行内核)。 Just 11 running contaiers there are, how can i increase this number to be 51 as the number of used Vcores?只有 11 个正在运行的容器,我如何将这个数字增加到 51 作为使用的 Vcor​​es 的数量?

The instance type m5.2xlarge has 8 vCPUs and 32G RAM.实例类型 m5.2xlarge有 8 个 vCPU 和 32G RAM。 You could do 4 executors per node at 2 vCPUs and 7G per executor, for a total of 44 executors.您可以在每个节点上执行 4 个执行程序,每个执行程序有 2 个 vCPU 和 7G,总共 44 个执行程序。 This would leave you 4G overhead on each worker node, which should be plenty.这会在每个工作节点上留下 4G 开销,这应该足够了。

Your spark-defaults config should be thus:你的spark-defaults配置应该是这样的:

    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.dynamicAllocation.enabled":"false",

            "spark.executor.instances":"44",
            "spark.executor.cores":"2",
            "spark.executor.memory":"7g"
        }
    },

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM