简体   繁体   English

Cloudformation 模板创建 EMR 集群

[英]Cloudformation template to create EMR cluster

I am trying to create EMR-5.30.1 clusters with applications such as Hadoop, livy, Spark, ZooKeeper, and Hive with the help of the CloudFormation template.我正在尝试借助 CloudFormation 模板创建包含 Hadoop、livy、Spark、ZooKeeper 和 Hive 等应用程序的 EMR-5.30.1 集群。 But the issue is with this template is I am able the cluster with only one application from the above list of applications.但是这个模板的问题是我只能使用上述应用程序列表中的一个应用程序来创建集群。

below is the CloudFormation Template下面是 CloudFormation 模板

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "Best Practice EMR Cluster for Spark or S3 backed Hbase",
  "Parameters": {
    "EMRClusterName": {
      "Description": "Name of the cluster",
      "Type": "String",
      "Default": "emrcluster"
    },
    "KeyName": {
      "Description": "Must be an existing Keyname",
      "Type": "String",
      "Default": "keyfilename"
    },
    "MasterInstanceType": {
      "Description": "Instance type to be used for the master instance.",
      "Type": "String",
      "Default": "m5.xlarge"
    },
    "CoreInstanceType": {
      "Description": "Instance type to be used for core instances.",
      "Type": "String",
      "Default": "m5.xlarge"
    },
    "NumberOfCoreInstances": {
      "Description": "Must be a valid number",
      "Type": "Number",
      "Default": 1
    },
    "SubnetID": {
      "Description": "Must be Valid public subnet ID",
      "Default": "subnet-ee15b3e0",
      "Type": "String"
    },
    "LogUri": {
      "Description": "Must be a valid S3 URL",
      "Default": "s3://aws/elasticmapreduce/",
      "Type": "String"
    },
    "S3DataUri": {
      "Description": "Must be a valid S3 bucket URL ",
      "Default": "s3://aws/elasticmapreduce/",
      "Type": "String"
    },
    "ReleaseLabel": {
      "Description": "Must be a valid EMR release  version",
      "Default": "emr-5.30.1",
      "Type": "String"
    },
    "Applications": {
      "Description": "Please select which application will be installed on the cluster this would be either Ganglia and spark, or Ganglia and s3 backed Hbase",
      "Type": "String",
      "AllowedValues": [
        "Spark",
        "Hbase",
    "Hive",
    "Livy",
    "ZooKeeper"
    ]
     }
  },
  "Mappings": {},
  "Conditions": {
    "Spark": {
      "Fn::Equals": [
        {
          "Ref": "Applications"
        },
        "Spark"
      ]
    },
    "Hbase": {
      "Fn::Equals": [
        {
          "Ref": "Applications"
        },
        "Hbase"
      ]
    },
    "Hive": {
      "Fn::Equals": [
        {
          "Ref": "Applications"
        },
        "Hive"
      ]
    },
    "Livy": {
      "Fn::Equals": [
        {
          "Ref": "Applications"
        },
        "Livy"
      ]
    },
    "ZooKeeper": {
      "Fn::Equals": [
        {
          "Ref": "Applications"
        },
        "ZooKeeper"
      ]
    }
   },
  "Resources": {
    "EMRCluster": {
      "DependsOn": [
        "EMRClusterServiceRole",
        "EMRClusterinstanceProfileRole",
        "EMRClusterinstanceProfile"
      ],
      "Type": "AWS::EMR::Cluster",
      "Properties": {
        "Applications": [
          {
            "Name": "Ganglia"
          },
          {
            "Fn::If": [
              "Spark",
              {
                "Name": "Spark"
              },
              {
                "Ref": "AWS::NoValue"
              }
            ]
          },
          {
            "Fn::If": [
              "Hbase",
              {
                "Name": "Hbase"
              },
              {
                "Ref": "AWS::NoValue"
              }
            ]
          },
      {
            "Fn::If": [
              "Hive",
              {
                "Name": "Hive"
              },
              {
                "Ref": "AWS::NoValue"
              }
            ]
          },
      {
            "Fn::If": [
              "Livy",
              {
                "Name": "Livy"
              },
              {
                "Ref": "AWS::NoValue"
              }
            ]
          },
          {
            "Fn::If": [
              "ZooKeeper",
              {
                "Name": "ZooKeeper"
              },
              {
                "Ref": "AWS::NoValue"
              }
            ]
          }
    ],
        "Configurations": [
          {
            "Classification": "hbase-site",
            "ConfigurationProperties": {
              "hbase.rootdir":{"Ref":"S3DataUri"}
            }
          },
          {
            "Classification": "hbase",
            "ConfigurationProperties": {
              "hbase.emr.storageMode": "s3"
            }
          }
        ],
        "Instances": {
          "Ec2KeyName": {
            "Ref": "KeyName"
          },
          "Ec2SubnetId": {
            "Ref": "SubnetID"
          },
          "MasterInstanceGroup": {
            "InstanceCount": 1,
            "InstanceType": {
              "Ref": "MasterInstanceType"
            },
            "Market": "ON_DEMAND",
            "Name": "Master"
          },
          "CoreInstanceGroup": {
            "InstanceCount": {
              "Ref": "NumberOfCoreInstances"
            },
            "InstanceType": {
              "Ref": "CoreInstanceType"
            },
            "Market": "ON_DEMAND",
            "Name": "Core"
          },
          "TerminationProtected": false
        },
        "VisibleToAllUsers": true,
        "JobFlowRole": {
          "Ref": "EMRClusterinstanceProfile"
        },
        "ReleaseLabel": {
          "Ref": "ReleaseLabel"
        },
        "LogUri": {
          "Ref": "LogUri"
        },
        "Name": {
          "Ref": "EMRClusterName"
        },
        "AutoScalingRole": "EMR_AutoScaling_DefaultRole",
        "ServiceRole": {
          "Ref": "EMRClusterServiceRole"
        }
      }
    },
    "EMRClusterServiceRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Principal": {
                "Service": [
                  "elasticmapreduce.amazonaws.com"
                ]
              },
              "Action": [
                "sts:AssumeRole"
              ]
            }
          ]
        },
        "ManagedPolicyArns": [
          "arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole"
        ],
        "Path": "/"
      }
    },
    "EMRClusterinstanceProfileRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Principal": {
                "Service": [
                  "ec2.amazonaws.com"
                ]
              },
              "Action": [
                "sts:AssumeRole"
              ]
            }
          ]
        },
        "ManagedPolicyArns": [
          "arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role"
        ],
        "Path": "/"
      }
    },
    "EMRClusterinstanceProfile": {
      "Type": "AWS::IAM::InstanceProfile",
      "Properties": {
        "Path": "/",
        "Roles": [
          {
            "Ref": "EMRClusterinstanceProfileRole"
          }
        ]
      }
    }
  },
  "Outputs": {}
}

Also, I want to add a bootstrap script in this template as well, Can anyone please help me with the issue.另外,我也想在这个模板中添加一个引导脚本,任何人都可以帮我解决这个问题。

As per my knoweldge and understanding, Applications in your case should be an array like below, as mentioned in documentation根据我的知识和理解,您的案例中的Applications应该是如下所示的数组,如文档中所述

 "Applications" : [ Application, ... ],

In you case, you can list applications like在您的情况下,您可以列出应用程序,例如

 "Applications" : [ 
     {"Name" : "Spark"},
{"Name" : "Hbase"},
{"Name" : "Hive"},
{"Name" : "Livy"},
{"Name" : "Zookeeper"},
]

For more arguments other than Name to individual application dictionary , see detail here , you can pass Args , Additional_info etc对于单个应用程序字典除Name之外的更多参数,请参阅此处的详细信息,您可以传递ArgsAdditional_info

You can use following way:-您可以使用以下方式:-

If you set "ReleaseLabel" then there is no need to mention versions of applications如果您设置“ReleaseLabel”,则无需提及应用程序的版本

"Applications": [{ "Name": "Hive" }, { "Name": "Presto" }, { "Name": "Spark" } ] “应用程序”:[{“名称”:“Hive”},{“名称”:“Presto”},{“名称”:“Spark”}]

For bootstrap:-对于引导程序:-

"BootstrapActions": [{ "Name": "setup", "ScriptBootstrapAction": { "Path": "s3://bucket/key/Bootstrap.sh" } }] "BootstrapActions": [{ "Name": "setup", "ScriptBootstrapAction": { "Path": "s3://bucket/key/Bootstrap.sh" } }]

Define like this to create all applications at once.像这样定义一次创建所有应用程序。

{
    "Type": "AWS::EMR::Cluster",
    "Properties": {
        "Applications": [
            {
                "Name": "Ganglia"
            },
            {
                "Name": "Spark"
            },
            {
                "Name": "Livy"
            },
            {
                "Name": "ZooKeeper"
            },
            {
                "Name": "JupyterHub"
            }
        ]
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM