简体   繁体   English

如何通过 S3 事件或 AWS Lambda 触发 Glue ETL Pyspark 作业?

[英]How to Trigger Glue ETL Pyspark job through S3 Events or AWS Lambda?

I'm planning to write certain jobs in AWS Glue ETL using Pyspark, which I want to get triggered as and when a new file is dropped in an AWS S3 Location, just like we do for triggering AWS Lambda Functions using S3 Events.我计划使用 Pyspark 在 AWS Glue ETL 中编写某些作业,我希望在将新文件放入 AWS S3 位置时触发它,就像我们使用 S3 事件触发 AWS Lambda 函数一样。

But, I see very narrowed down options only, to trigger a Glue ETL script.但是,我只看到非常狭窄的选项,以触发 Glue ETL 脚本。 Any help on this shall be highly appreciated.对此的任何帮助将不胜感激。

The following should work to trigger a Glue job from AWS Lambda.以下应该可以从 AWS Lambda 触发 Glue 作业。 Have the lambda configured to the appropriate S3 bucket, and IAM roles / permissions assigned to AWS Lambda so that lambda can start the AWS Glue job on behalf of the user.将 lambda 配置到适当的 S3 存储桶,并将 IAM 角色/权限分配给 AWS Lambda,以便 lambda 可以代表用户启动 AWS Glue 作业。

import boto3
print('Loading function')

def lambda_handler(_event, _context):
    glue = boto3.client('glue')
    gluejobname = "YOUR GLUE JOB NAME"

    try:
        runId = glue.start_job_run(JobName=gluejobname)
        status = glue.get_job_run(JobName=gluejobname, RunId=runId['JobRunId'])
        print("Job Status : ", status['JobRun']['JobRunState'])
    except Exception as e:
        print(e)
        raise

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将从 lambda 事件获取的 s3 object 名称作为参数传递给 AWS Glue 工作流 - How to pass s3 object names getting from the lambda events as a parameters to the AWS Glue Workflow 有什么方法可以在 AWS Glue 作业结束时触发 AWS Lambda function 吗? - Is there any way to trigger a AWS Lambda function at the end of an AWS Glue job? 如何从 s3 事件触发特定版本的 lambda - How to trigger a particular version of lambda from s3 events 使用 AWS Glue ETL 将镶木地板文件从 S3 加载到 AWS RDS 需要很长时间 - Loading parquet file from S3 to AWS RDS taking extremely long time using AWS Glue ETL AWS Glue ETL 作业和 AWS EMR 有什么区别? - What is the difference between AWS Glue ETL Job and AWS EMR? aws Glue 作业:如何在 s3 中合并多个 output.csv 文件 - aws Glue job: how to merge multiple output .csv files in s3 在 AWS Glue ETL 作业中从 S3 加载分区的 json 文件 - Load partitioned json files from S3 in AWS Glue ETL jobs 如何以编程方式验证 AWS lambda 和粘合作业状态 - How to validate AWS lambda and glue job status programatically AWS Glue 中的简单 ETL 作业显示“文件已存在” - Simple ETL job in AWS Glue says "File Already Exists" AWS 胶水作业 (Pyspark) 到 AWS 胶水数据目录 - AWS glue job (Pyspark) to AWS glue data catalog
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM