简体   繁体   中英

Create AWS EMR Serverless job with AWS CDK

I'm using aws cdk to setup my infrastructure. I'm wondering if there is any way to create a ETL job through an EMR serverless application with AWS CDK?

I can create the serverless application with CDK but cant find how to create a job.

There's not currently a built-in way to create a job with CDK (or CloudFormation). This is partially because CDK is typically used to deploy infrastructure while something like Airflow or Step Functions would be used to trigger an actual job on a recurring basis.

You could, in theory, write a custom resource to trigger a job. Here's an example of how to do so with Python CDK. This code creates an EMR Serverless application, a role that can be used with the job (no access granted in this case), and a custom resource that starts the job. Note that the policy associated with the custom resource needs to have iam:PassRole access granted to the EMR Serverless job execution role.

from aws_cdk import Stack
from aws_cdk import aws_emrserverless as emrs
from aws_cdk import aws_iam as iam  # Duration,
from aws_cdk import custom_resources as custom
from constructs import Construct


class EmrServerlessJobRunStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # Create a serverless Spark app
        serverless_app = emrs.CfnApplication(
            self,
            "spark_app",
            release_label="emr-6.9.0",
            type="SPARK",
            name="cdk-spark",
        )

        # We need an execution role to run the job, this one has no access to anything
        # But will be granted PassRole access by the Lambda that's starting the job.
        role = iam.Role(
            scope=self,
            id="spark_job_execution_role",
            assumed_by=iam.ServicePrincipal("emr-serverless.amazonaws.com"),
        )

        # Create a custom resource that starts a job run
        myjobrun = custom.AwsCustomResource(
            self,
            "serverless-job-run",
            on_create={
                "service": "EMRServerless",
                "action": "startJobRun",
                "parameters": {
                    "applicationId": serverless_app.attr_application_id,
                    "executionRoleArn": role.role_arn,
                    "name": "cdkJob",
                    "jobDriver": {"sparkSubmit": {"entryPoint": "local:///usr/lib/spark/examples/src/main/python/pi.py"}},
                },
                "physical_resource_id": custom.PhysicalResourceId.from_response(
                    "jobRunId"
                ),
            },
            policy=custom.AwsCustomResourcePolicy.from_sdk_calls(
                resources=custom.AwsCustomResourcePolicy.ANY_RESOURCE
            ),
        )

        # Ensure the Lambda can call startJobRun with the earlier-created role
        myjobrun.grant_principal.add_to_policy(
            iam.PolicyStatement(
                effect=iam.Effect.ALLOW,
                resources=[role.role_arn],
                actions=["iam:PassRole"],
                conditions={
                    "StringLike": {
                        "iam:PassedToService": "emr-serverless.amazonaws.com"
                    }
                },
            )
        )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM