如何通过 S3 事件或 AWS Lambda 触发 Glue ETL Pyspark 作业？

Question

I'm planning to write certain jobs in AWS Glue ETL using Pyspark, which I want to get triggered as and when a new file is dropped in an AWS S3 Location, just like we do for triggering AWS Lambda Functions using S3 Events.我计划使用 Pyspark 在 AWS Glue ETL 中编写某些作业，我希望在将新文件放入 AWS S3 位置时触发它，就像我们使用 S3 事件触发 AWS Lambda 函数一样。

But, I see very narrowed down options only, to trigger a Glue ETL script.但是，我只看到非常狭窄的选项，以触发 Glue ETL 脚本。 Any help on this shall be highly appreciated.对此的任何帮助将不胜感激。

Answer 1

The following should work to trigger a Glue job from AWS Lambda.以下应该可以从 AWS Lambda 触发 Glue 作业。 Have the lambda configured to the appropriate S3 bucket, and IAM roles / permissions assigned to AWS Lambda so that lambda can start the AWS Glue job on behalf of the user.将 lambda 配置到适当的 S3 存储桶，并将 IAM 角色/权限分配给 AWS Lambda，以便 lambda 可以代表用户启动 AWS Glue 作业。

import boto3
print('Loading function')

def lambda_handler(_event, _context):
    glue = boto3.client('glue')
    gluejobname = "YOUR GLUE JOB NAME"

    try:
        runId = glue.start_job_run(JobName=gluejobname)
        status = glue.get_job_run(JobName=gluejobname, RunId=runId['JobRunId'])
        print("Job Status : ", status['JobRun']['JobRunState'])
    except Exception as e:
        print(e)
        raise

Answer 2

Updated "s3-native" answer if you don't want to use a lambda:如果您不想使用 lambda，请更新“s3-native”答案：

https://aws.amazon.com/about-aws/whats-new/2021/10/aws-glue-crawlers-amazon-s3-notifications/ https://aws.amazon.com/about-aws/whats-new/2021/10/aws-glue-crawlers-amazon-s3-notifications/

https://docs.aws.amazon.com/glue/latest/dg/crawler-s3-event-notifications.html https://docs.aws.amazon.com/glue/latest/dg/crawler-s3-event-notifications.html

如何通过 S3 事件或 AWS Lambda 触发 Glue ETL Pyspark 作业？

问题描述

1 个解决方案

解决方案1
9 已采纳 2019-08-26 09:43:31

解决方案2
0 2022-09-13 10:57:49

如何通过 S3 事件或 AWS Lambda 触发 Glue ETL Pyspark 作业？

问题描述

1 个解决方案

解决方案1 9 已采纳 2019-08-26 09:43:31

解决方案2 0 2022-09-13 10:57:49

解决方案1
9 已采纳 2019-08-26 09:43:31

解决方案2
0 2022-09-13 10:57:49