![](/img/trans.png)
[英]list_schemas() method missing on Boto3 Glue client object
[英]boto3 glue get_job_runs - check execution with certain date exists in the response object
我正在嘗試使用通過 boto3 的粘合客戶端獲得的“get_job_runs”function 獲取前一天失敗的粘合作業執行。 https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_job_runs 。
請求語法沒有按日期/狀態過濾執行或作業運行的選項 -
response = client.get_job_runs(
JobName='string',
NextToken='string',
MaxResults=123
)
我收到的回復如下所示 -
{
"JobRuns": [
{
"Id": "jr_89bfa55b544f7eec4f6ea574dfb0345345uhi4df65e59869e93c5d8f5efef989",
"Attempt": 0,
"JobName": "GlueJobName",
"StartedOn": datetime.datetime(2021, 1, 27, 4, 32, 47, 718000, tzinfo=tzlocal()),
"LastModifiedOn": datetime.datetime(2021, 1, 27, 4, 36, 14, 975000, tzinfo=tzlocal()),
"CompletedOn": datetime.datetime(2021, 1, 27, 4, 36, 14, 975000, tzinfo=tzlocal()),
"JobRunState": "FAILED",
"Arguments": {
"--additional-python-modules": "awswrangler",
"--conf": "spark.executor.memory=40g",
"--conf ": "spark.driver.memory=40g",
"--enable-spark-ui": "true",
"--extra-py-files": "s3://GlueJobName/lttb.py",
"--job-bookmark-option": "job-bookmark-disable",
"--spark-event-logs-path": "s3://GlueJobName/glue-script/spark-event-logs"
},
"ErrorMessage": "MemoryError: Unable to allocate xxxxx",
"PredecessorRuns": [],
"AllocatedCapacity": 8,
"ExecutionTime": 199,
"Timeout": 2880,
"MaxCapacity": 8.0,
"WorkerType": "G.2X",
"NumberOfWorkers": 4,
"LogGroupName": "/aws-glue/jobs",
"GlueVersion": "2.0"
}
],
"NextToken": "string"
}
所以,我現在正在做的是通過響應 object 循環檢查“CompletedOn”日期是否與使用 datetime 和 timedelta 計算的 prev_day 的昨天日期匹配,我在 while 循環中執行此操作以獲取最后 10000 次執行,作為 ' get_job_runs 的單次調用是不夠的。
import boto3
from datetime import datetime, timedelta
logger = logging.getLogger()
logger.setLevel(logging.INFO)
glue_client = boto3.client("glue")
def filter_failed_exec_prev_day(executions, prev_day) -> list:
filtered_resp = []
for execution in executions['JobRuns']:
if execution['JobRunState'] == 'FAILED' and execution['CompletedOn'].date() == prev_day:
filtered_resp.append(execution)
return filtered_resp
def get_final_executions() -> list:
final_job_runs_list = []
MAX_EXEC_SEARCH_CNT = 10000
prev_day = (datetime.utcnow() - timedelta(days=1)).date()
buff_exec_cnt = 0
l_job = 'GlueJobName'
response = glue_client.get_job_runs(
JobName=l_job
)
resp_count = len(response['JobRuns'])
if resp_count > 0:
buff_exec_cnt += resp_count
filtered_resp = filter_failed_exec_prev_day(response, prev_day)
final_job_runs_list.extend(filtered_resp)
while buff_exec_cnt <= MAX_EXEC_SEARCH_CNT:
if 'NextToken' in response:
response = glue_client.get_job_runs(
JobName=l_job
)
buff_exec_cnt += len(response['JobRuns'])
filtered_resp = filter_failed_exec_prev_day(response, prev_day)
final_job_runs_list.extend(filtered_resp)
else:
logger.info(f"{job} executions list: {final_job_runs_list}")
break
return final_job_runs_list
在這里,我使用 while 循環在達到 10K 次執行后中斷調用,這是我們每天在作業中看到的執行量的三倍。
現在,我希望在遇到屬於prev_day - 1
的執行后打破 while 循環,因此是否可以在整個響應字典中搜索prev_day - 1
以確保考慮到 datetime.datetime object 涵蓋所有前一天的執行我們從 boto3 收到CompletedOn
屬性?
感謝通讀。
謝謝
我看了你的代碼。 而且我認為它可能總是返回相同的結果,因為您沒有正確地遍歷結果集。 這里:
while buff_exec_cnt <= MAX_EXEC_SEARCH_CNT:
if 'NextToken' in response:
response = glue_client.get_job_runs(
JobName=l_job
)
您需要將 NextToken 值傳遞給 get_job_runs 方法,如下所示:
response = glue_client.get_job_runs(
JobName=l_job, NextToken= response['NextToken']
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.