简体   繁体   English

使用ShellCommandActivity创建AWS Data Pipeline EMR集群

[英]Creating an AWS Data Pipeline EMR cluster using ShellCommandActivity

When I create an AWS EMR I can do so through their simple wizard on the AWS Management Console . 创建AWS EMR时 ,可以通过其在AWS管理控制台上的简单向导来进行。 Once it's completed I can test it out and when I'm happy with it's configuration I can simply click the AWS CLI Export button and copy the CLI command that creates the EMR. 完成后,我可以对其进行测试,当我对它的配置满意时,只需单击“ AWS CLI导出”按钮,然后复制创建EMR的CLI命令。

I need to create an EMR as part of my AWS Data Pipeline process and rather than configure the EmrCluser and then running whatever EmrActivity I want I'm wondering if I could just copy my CLI command I exported during my testing and paste it inside a ShellCommandActivity which will create the EMR. 我需要创建一个EMR作为我的AWS Data Pipeline流程的一部分,而不是配置EmrCluser然后运行我想要的任何EmrActivity我想知道我是否可以复制在测试期间导出的CLI命令并将其粘贴到ShellCommandActivity中这将创建EMR。 From there I could use either an EmrActivity to do some processing or just use the ShellCommandActivity to do the processing. 从那里,我可以使用EmrActivity进行一些处理,也可以仅使用ShellCommandActivity进行处理。

Can I create my AWS Data Pipeline EMR Cluster using a CLI command that's run through a ShellCommandActivity? 是否可以使用通过ShellCommandActivity运行的CLI命令创建我的AWS Data Pipeline EMR集群? And if I do so will I be able to run an EmrActivity against that EMR Cluster? 如果这样做,我是否可以针对该EMR集群运行EmrActivity? I just think it would be easier to create the EMR this way because I can use the AWS Management Console to create my EMR and then I can test my EMR before exporting the CLI command rather than going through the process of properly constructing the EMR through the AWS Data Pipeline wizard/JSON process. 我只是认为这样创建EMR会更容易,因为我可以使用AWS管理控制台创建我的EMR,然后可以在导出CLI命令之前测试我的EMR,而不是通过适当的方式构造EMR。 AWS Data Pipeline向导/ JSON流程。 IE, The actual EMR wizard on the AWS Management Console is way easier than the Data Pipeline wizard for creating the EMR on the AWS Management Console, especially when it comes to choosing my security groups and various configurations. IE,AWS管理控制台上的实际EMR向导比Data Pipeline向导在AWS管理控制台上创建EMR的方法容易得多,尤其是在选择我的安全组和各种配置时。

Update: 更新:

I just verified that I can in fact run a CLI command through the ShellCommandActivity to create my EMR through the Data Pipeline but is this possibly a code smell or bad practice? 我刚刚验证了我实际上可以通过ShellCommandActivity运行CLI命令来通过数据管道创建我的EMR,但这是否可能是代码异味或不良做法? Are there any downfalls to creating and EMR on the Data Pipeline this way rather than doing it through the predefined EmrCluster command? 以这种方式(而不是通过预定义的EmrCluster命令执行)在数据管道上创建和EMR是否有任何不足?

It's possible, but a little complicated: 可能,但是有点复杂:

  1. The following action or the script itself would have to wait for the cluster to be created. 以下操作或脚本本身必须等待集群创建。 Make sure the action does not time out. 确保操作不超时。
  2. The data pipeline does not know about the cluster, hence you need to specify a workerGroup instead of runsOn in the EMRActivity. 数据管道不了解集群,因此您需要在runsOn中指定一个workerGroup而不是runsOn。 You also need to install Task Runner on the cluster. 您还需要在群集上安装Task Runner

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS Data Pipeline中的ShellCommandActivity - ShellCommandActivity in AWS Data Pipeline 如何使用AWS Data Pipeline ShellCommandActivity在CSV上运行“剪切”命令 - How to run “cut” command on csv using AWS Data Pipeline ShellCommandActivity AWS Data Pipeline - 如何从ShellCommandActivity设置全局管道变量 - AWS Data Pipeline - How to set global pipeline variable from ShellCommandActivity AWS Data Pipeline配置的运行Spark的EMR集群 - AWS Data Pipeline configured EMR cluster running Spark AWS Data Pipeline选项可自动终止EMR集群 - AWS Data Pipeline option to auto terminate EMR cluster 在长时间运行的EMR集群上设置AWS Data Pipeline - Setup AWS Data Pipeline on long running EMR cluster 如何使用AWS Data Pipeline将EBS卷附加到EMR集群? - How can I attach an EBS Volume to an EMR Cluster using the AWS Data Pipeline? 如何使AWS数据管道ShellCommandActivity脚本执行python文件 - How to make a AWS Data Pipeline ShellCommandActivity Script execute a python file 是否可以使用数据管道通过自动缩放来创建EMR集群 - Is it possible to create EMR cluster with Auto scaling using Data pipeline 如何通过ShellCommandActivity(AWS Data Pipeline)中的数据管道在SQL Server中调用存储过程 - How do I call a stored procedure in SQL Server with Data Pipeline in ShellCommandActivity (AWS Data Pipeline)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM