[英]Creating an AWS Data Pipeline EMR cluster using ShellCommandActivity
When I create an AWS EMR I can do so through their simple wizard on the AWS Management Console . 创建AWS EMR时 ,可以通过其在AWS管理控制台上的简单向导来进行。 Once it's completed I can test it out and when I'm happy with it's configuration I can simply click the AWS CLI Export button and copy the CLI command that creates the EMR.
完成后,我可以对其进行测试,当我对它的配置满意时,只需单击“ AWS CLI导出”按钮,然后复制创建EMR的CLI命令。
I need to create an EMR as part of my AWS Data Pipeline process and rather than configure the EmrCluser and then running whatever EmrActivity I want I'm wondering if I could just copy my CLI command I exported during my testing and paste it inside a ShellCommandActivity which will create the EMR. 我需要创建一个EMR作为我的AWS Data Pipeline流程的一部分,而不是配置EmrCluser然后运行我想要的任何EmrActivity我想知道我是否可以复制在测试期间导出的CLI命令并将其粘贴到ShellCommandActivity中这将创建EMR。 From there I could use either an EmrActivity to do some processing or just use the ShellCommandActivity to do the processing.
从那里,我可以使用EmrActivity进行一些处理,也可以仅使用ShellCommandActivity进行处理。
Can I create my AWS Data Pipeline EMR Cluster using a CLI command that's run through a ShellCommandActivity? 是否可以使用通过ShellCommandActivity运行的CLI命令创建我的AWS Data Pipeline EMR集群? And if I do so will I be able to run an EmrActivity against that EMR Cluster?
如果这样做,我是否可以针对该EMR集群运行EmrActivity? I just think it would be easier to create the EMR this way because I can use the AWS Management Console to create my EMR and then I can test my EMR before exporting the CLI command rather than going through the process of properly constructing the EMR through the AWS Data Pipeline wizard/JSON process.
我只是认为这样创建EMR会更容易,因为我可以使用AWS管理控制台创建我的EMR,然后可以在导出CLI命令之前测试我的EMR,而不是通过适当的方式构造EMR。 AWS Data Pipeline向导/ JSON流程。 IE, The actual EMR wizard on the AWS Management Console is way easier than the Data Pipeline wizard for creating the EMR on the AWS Management Console, especially when it comes to choosing my security groups and various configurations.
IE,AWS管理控制台上的实际EMR向导比Data Pipeline向导在AWS管理控制台上创建EMR的方法容易得多,尤其是在选择我的安全组和各种配置时。
Update: 更新:
I just verified that I can in fact run a CLI command through the ShellCommandActivity to create my EMR through the Data Pipeline but is this possibly a code smell or bad practice? 我刚刚验证了我实际上可以通过ShellCommandActivity运行CLI命令来通过数据管道创建我的EMR,但这是否可能是代码异味或不良做法? Are there any downfalls to creating and EMR on the Data Pipeline this way rather than doing it through the predefined EmrCluster command?
以这种方式(而不是通过预定义的EmrCluster命令执行)在数据管道上创建和EMR是否有任何不足?
It's possible, but a little complicated: 可能,但是有点复杂:
workerGroup
instead of runsOn
in the EMRActivity. runsOn
中指定一个workerGroup
而不是runsOn。 You also need to install Task Runner on the cluster.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.