简体   繁体   English

使用 EMR 无服务器的 AWS Step 函数编排作业

[英]Orchestration of jobs using AWS Step functions using EMR Serverless

Recently Amazon launched EMR Serverless and I want to repurpose my exiting data pipeline orchestration that uses AWS Step Functions : There are steps that create EMR cluster, run some lambda functions, submit Spark Jobs (mostly Scala jobs using spark-submit) and finally terminate the cluster.最近亚马逊推出了EMR 无服务器,我想重新利用我现有的使用AWS Step Functions的数据管道编排:有一些步骤可以创建 EMR 集群,运行一些 lambda 函数,提交 Spark 作业(主要是使用 spark-submit 的 Scala 作业),最后终止簇。 All these steps are of sync type ( arn:aws:states:::elasticmapreduce:addStep.sync )所有这些步骤都是同步类型( arn:aws:states:::elasticmapreduce:addStep.sync

There are documentation and github samples that describe submitting jobs from orchestration framework such as AirFlow but there is nothing that describes how to use AWS Step Function with EMR Serverless.有文档和 github 示例描述了从编排框架(例如 AirFlow)提交作业,但没有任何内容描述如何将 AWS Step Function 与 EMR Serverless 一起使用。 Any help in this regard is appreciated.感谢您在这方面的任何帮助。

Primarily I am interested in repurposing task step function of type arn:aws:states:::elasticmapreduce:addStep.sync that takes parameters such as ClusterId but in case of EMR Serverless there is no such id.我主要对重新利用类型为arn:aws:states:::elasticmapreduce:addStep.sync的任务步骤 function 感兴趣,它采用ClusterId等参数,但在 EMR Serverless 的情况下没有这样的 id。

In summary is there equivalent of Call Amazon EMR with Step Functions for EMR Serverless?总而言之,是否有等效于 EMR Serverless 的带有 Step Functions 的调用 Amazon EMR

Currently there is no direct integration of EMR Serverless with Step Functions.目前没有 EMR Serverless 与 Step Functions 的直接集成。 However a possible solution is adding a Lambda Layer on top and use the SDK to create emr serverless applications and submit jobs.然而,一个可能的解决方案是在顶部添加一个 Lambda 层并使用 SDK 创建 emr 无服务器应用程序并提交作业。 However you would need an additional lambda to implement a poller that tracks the success of the jobs (in case of interdependent jobs) as it is highly likely that the emr job will outrun the 15 min runtime limitation of the lambda.但是,您需要一个额外的 lambda 来实现跟踪作业成功的轮询器(在相互依赖的作业的情况下),因为 emr 作业很可能会超过 lambda 的 15 分钟运行时限制。

As @KhalidJahangeer said, there's still no direct integration between EMR serverless and Step Functions, but you might want to check the new capability AWS SDK Service Integrations .正如@KhalidJahangeer 所说,EMR 无服务器和 Step Functions 之间仍然没有直接集成,但您可能需要检查新功能AWS SDK 服务集成

This option allows you to make use of services' APIs that aren't directly integrated with Step Functions, previously if the service integration was not available, you had to code the integration in an AWS Lambda function but now this last step can be removed, as AWS explains:此选项允许您使用未直接与 Step Functions 集成的服务 API,以前如果服务集成不可用,您必须在 AWS Lambda function 中对集成进行编码,但现在可以删除最后一步,正如 AWS 解释的那样:

You can create state machines that use AWS SDK Service Integrations with Amazon States Language (ASL), AWS Cloud Development Kit (AWS CDK), or visually using AWS Step Function Workflow Studio.您可以创建 state 机器,这些机器使用 AWS SDK 服务集成与 Amazon 状态语言 (ASL)、AWS 云开发工具包 (AWS CDK) 或使用 AWS Step Z86408593C34AF77FDD90DF932F8B5261 可视化。 To get started, create a new Task state.首先,创建一个新任务 state。 Then call AWS SDK services directly from the ASL in the resource field of a task state.然后在任务 state 的资源字段中直接从 ASL 调用 AWS SDK 服务。 To do this, use the following syntax.为此,请使用以下语法。

 arn:aws:states:::aws-sdk:serviceName:apiAction.[serviceIntegrationPattern]

Some important things to keep in mind are:要记住的一些重要事项是:

  • Call AWS SDK services directly from the ASL in the resource field of a task state.直接从任务 state 的资源字段中的 ASL 调用 AWS SDK 服务。 To do this, use the following syntax: arn:aws:states:::aws-sdk:serviceName:apiAction.[serviceIntegrationPattern]为此,请使用以下语法: arn:aws:states:::aws-sdk:serviceName:apiAction.[serviceIntegrationPattern]
  • Use camelCase for apiAction names in the Resource field, such as copyObject , and use PascalCase for parameter names in the Parameters field, such as CopySource . Resource 字段中的apiAction名称使用 camelCase,例如copyObject ,Parameters 字段中的参数名称使用 PascalCase,例如CopySource
  • Step Functions can't autogenerate IAM policies for most AWS SDK Service - - Integrations, so you need to add those to the IAM role of the state machine manually. Step Functions 无法为大多数 AWS SDK 服务 - - 集成自动生成 IAM 策略,因此您需要手动将这些策略添加到 state 机器的 IAM 角色。
  • Take advantage of ASL intrinsic functions, as those allow you to manipulate the data and avoid using Lambda functions for simple transformations.利用 ASL 内在函数,因为它们允许您操作数据并避免使用 Lambda 函数进行简单转换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM