简体   繁体   English

如何使用 AWS Glue python shell 作业从 python 脚本中获取 job_id?

[英]How to get job_id from within the python script using AWS Glue python shell job?

I am trying to access the AWS ETL Glue Python shell job id from the script of that job.我正在尝试从该作业的脚本访问 AWS ETL Glue Python shell 作业 ID。 This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf0248150067f2.这是您可以在 AWS Glue 控制台的第一列中看到的 RunID,类似于 jr_5fc6d4ecf0248150067f2。 How do I get it programmatically within a AWS Glue python shell job?如何在 AWS Glue python shell 作业中以编程方式获取它?

Note: python shell jobs are not the same as pyspark jobs in AWS Glue.注意:python shell 作业与 AWS Glue 中的 pyspark 作业不同。

Yeah, it will sound crazy, but I added the parameter to the job called JOB_NAME set the job name and then inside the script I used boto3 to query the job to get it's run id.是的,这听起来很疯狂,但是我将参数添加到名为 JOB_NAME 的作业中设置作业名称,然后在脚本中我使用 boto3 查询作业以获取它的运行 ID。 Probably not the best, but the only way I found.可能不是最好的,但我发现的唯一方法。 If anyone has a better solution then I will change accepted answer.如果有人有更好的解决方案,那么我将更改接受的答案。

def get_running_job_id(job_name):
    session = boto3.session.Session()
    glue_client = session.client('glue')
    try:
        response = glue_client.get_job_runs(JobName=job_name)
        for res in response['JobRuns']:
            print("Job Run id is:"+res.get("Id"))
            print("status is:"+res.get("JobRunState"))
            if res.get("JobRunState") == "RUNNING":
                return res.get("Id")
        else:
            return None
    except ClientError as e:
        raise Exception("boto3 client error in get_status_of_job_all_runs: " + e.__str__())
    except Exception as e:
        raise Exception("Unexpected error in get_status_of_job_all_runs: " + e.__str__())

I could not find a solution for this.我找不到解决方案。 There is no official documentation around this, and sys.argv (command line arguments) does not have JOB_RUN_ID parameter passed to the Python script while running a AWS Glue Python Shell job.没有关于此的官方文档,并且sys.argv (命令行参数)在运行 AWS Glue Python Shell 作业时没有将JOB_RUN_ID参数传递给 Python 脚本。

In my tests, I found that the arguments passed to a Python Shell job are:在我的测试中,我发现传递给 Python Shell 作业的参数是:

  • job-bookmark-option
  • scriptLocation

However, while running a AWS Glue Spark job, following arguments get passed:但是,在运行 AWS Glue Spark 作业时,会传递以下参数:

  • JOB_ID
  • JOB_NAME
  • JOB_RUN_ID
  • job-bookmark-option
  • TempDir

And hence, there is no official or obvious way of finding JOB_RUN_ID from inside a Python script running as a Python Shell job on AWS Glue.因此,没有官方或明显的方法可以从作为 Python Shell 作业在 AWS Glue 上运行的 Python 脚本中找到JOB_RUN_ID I will update if AWS fixes this in future.如果 AWS 将来修复此问题,我会更新。 Thanks!谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM