简体   繁体   English

AWS Glue ETL 作业中的 Boto3 Glue

[英]Boto3 Glue in AWS Glue ETL Job

I am running AWS Glue ETL job (Pyspark) where I have created a boto3 client of Glue to start the crawler and do some other PySpark processing.我正在运行 AWS Glue ETL 作业 (Pyspark),我在其中创建了 Glue 的 boto3 客户端来启动爬虫并执行其他一些 PySpark 处理。 The issue is that the Glue job keeps on running after start_crawler is called.问题是调用start_crawler后 Glue 作业继续运行。 It neither gives any error, nor ends or starts the crawler.它既不给出任何错误,也不结束或启动爬虫。 My code snippet is below:我的代码片段如下:

import sys
import boto3
import time

glue_client = boto3.client('glue', region_name = 'us-east-1')
crawler_name = 'test_crawler'
    
print('Starting crawler...')
print(crawler_name)
glue_client.start_crawler(Name=crawler_name)

Whereas the same code if I execute in the Python Shell Glue Job, it successfully starts the crawler and the job terminates.而如果我在 Python Shell Glue Job 中执行相同的代码,它会成功启动爬虫并终止作业。 What am I doing wrong here or do I need to do something specific w.r.t Glue ETL job?我在这里做错了什么还是我需要做一些特定的 w.r.t Glue ETL 工作?

Edit: My Glue job has a Glue connection attached to it which I am using to connect to RDS.编辑:我的 Glue 作业附加了一个 Glue 连接,我用它来连接到 RDS。 If I remove this, then this code works fine.如果我删除它,那么这段代码可以正常工作。 But I need this connection to be there to connect to RDS.但我需要这个连接才能连接到 RDS。 Any help?有什么帮助吗?

This is not an answer to your question, but just a tip.这不是您的问题的答案,而只是一个提示。 I don´t think its a good idea to start the crawler in the same job.我认为在同一工作中启动爬虫并不是一个好主意。 You don´t have control when the crawler finishes and if it finishes well.您无法控制爬虫何时完成以及它是否完成得好。 What I do is create an AWS Step Function and create workflows, first the glue job and after it finishes, the crawler would be the next step.我要做的是创建一个 AWS Step Function 并创建工作流,首先是粘合作业,完成后,下一步将是爬虫。 That way you can control and monitor the process.这样您就可以控制和监控该过程。

I was having the same error and moved my ETL jobs to aws glue 3.0, and now boto3 client is working for me.我遇到了同样的错误,并将我的 ETL 作业移至 aws glue 3.0,现在 boto3 客户端正在为我工作。 let me know if this doesn't solve your problem让我知道这是否不能解决您的问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM