boto3 glue get_job_runs - check execution with certain date exists in the response object

Question

I am trying to fetch glue job executions that got failed previous day using 'get_job_runs' function available through boto3's glue client. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_job_runs .

The request syntax, does not have an option to filter executions or job runs by date/status -

response = client.get_job_runs(
    JobName='string',
    NextToken='string',
    MaxResults=123
)

The response I receive back looks something like below -

{
  "JobRuns": [
    {
      "Id": "jr_89bfa55b544f7eec4f6ea574dfb0345345uhi4df65e59869e93c5d8f5efef989",
      "Attempt": 0,
      "JobName": "GlueJobName",
      "StartedOn": datetime.datetime(2021, 1, 27, 4, 32, 47, 718000, tzinfo=tzlocal()),
      "LastModifiedOn": datetime.datetime(2021, 1, 27, 4, 36, 14, 975000, tzinfo=tzlocal()),
      "CompletedOn": datetime.datetime(2021, 1, 27, 4, 36, 14, 975000, tzinfo=tzlocal()),
      "JobRunState": "FAILED",
      "Arguments": {
        "--additional-python-modules": "awswrangler",
        "--conf": "spark.executor.memory=40g",
        "--conf ": "spark.driver.memory=40g",
        "--enable-spark-ui": "true",
        "--extra-py-files": "s3://GlueJobName/lttb.py",
        "--job-bookmark-option": "job-bookmark-disable",
        "--spark-event-logs-path": "s3://GlueJobName/glue-script/spark-event-logs"
      },
      "ErrorMessage": "MemoryError: Unable to allocate xxxxx",
      "PredecessorRuns": [],
      "AllocatedCapacity": 8,
      "ExecutionTime": 199,
      "Timeout": 2880,
      "MaxCapacity": 8.0,
      "WorkerType": "G.2X",
      "NumberOfWorkers": 4,
      "LogGroupName": "/aws-glue/jobs",
      "GlueVersion": "2.0"
    }
  ],
  "NextToken": "string"
}

So, what I am doing now is looping through the response object to check if the "CompletedOn" date matches with yesterday's date using prev_day calculated using datetime and timedelta and I am doing this in a while loop to fetch last 10000 executions, as the 'get_job_runs' single call is insufficient.

import boto3
from datetime import datetime, timedelta

logger = logging.getLogger()
logger.setLevel(logging.INFO)

glue_client = boto3.client("glue")

def filter_failed_exec_prev_day(executions, prev_day) -> list:
    filtered_resp = []
    for execution in executions['JobRuns']:
        if execution['JobRunState'] == 'FAILED' and execution['CompletedOn'].date() == prev_day:
            filtered_resp.append(execution)
    return filtered_resp


def get_final_executions() -> list:
    final_job_runs_list = []
    MAX_EXEC_SEARCH_CNT = 10000
    prev_day = (datetime.utcnow() - timedelta(days=1)).date()
    buff_exec_cnt = 0
    l_job = 'GlueJobName'

    response = glue_client.get_job_runs(
        JobName=l_job
    )
    resp_count = len(response['JobRuns'])

    if resp_count > 0:
        buff_exec_cnt += resp_count
        filtered_resp = filter_failed_exec_prev_day(response, prev_day)
        final_job_runs_list.extend(filtered_resp)

        while buff_exec_cnt <= MAX_EXEC_SEARCH_CNT:
            if 'NextToken' in response:
                response = glue_client.get_job_runs(
                    JobName=l_job
                )
                buff_exec_cnt += len(response['JobRuns'])
                filtered_resp = filter_failed_exec_prev_day(response, prev_day)
                final_job_runs_list.extend(filtered_resp)
            else:
                logger.info(f"{job} executions list: {final_job_runs_list}")
                break
    return final_job_runs_list

Here, I am using a while loop to break the call after hitting 10K executions, this is triple the amount of executions we see each day on the job.
Now, I am hoping to break the while loop after I encounter execution that belongs to prev_day - 1 , so is it possible to search the entire response dict for prev_day - 1 to make sure all prev day's executions are covered considering the datetime.datetime object we receive from boto3 for CompletedOn attribute?

Appreciate reading through.

Thank you

Answer 1

I looked at your code. And I think it might return always the same result as you're not iterating through the resultset correctly. here:

while buff_exec_cnt <= MAX_EXEC_SEARCH_CNT:
            if 'NextToken' in response:
                response = glue_client.get_job_runs(
                    JobName=l_job
                )

you need to pass the NextToken value to the get_job_runs method, like this:

response = glue_client.get_job_runs(
                    JobName=l_job, NextToken= response['NextToken']
                )

boto3 glue get_job_runs - check execution with certain date exists in the response object

Question

1 answers

solution1
1 2021-05-25 01:48:45

boto3 glue get_job_runs - check execution with certain date exists in the response object

Question

1 answers

solution1 1 2021-05-25 01:48:45

solution1
1 2021-05-25 01:48:45