简体   繁体   中英

Composer (Airflow) DAG RunID conflict in GCP

We have one cloud function, which is cloud storage based. This cloud function will trigger once the file loaded into the bucket. When file loaded, the function will call/trigger the airflow DAG. This DAG will process the file.

The issue is, when multiple files placed same time with in a second, the function call is failing with the below error,

b'{"error":"Run id manual__2020-07-31T17:48:15+00:00 already exists for dag id pl_imaoc_trigger_dag"}\n'

To resolve this issue we passing the run_id as 'run_id': 'IMAOC_31072020201842766625', date with milliseconds.

Code:

dag_name = environ_vars['imaoc_meta_dag']
    webserver_url = (
        webserver_id
        + '/api/experimental/dags/'
        + dag_name
        + '/dag_runs'
    )

    print('webserver_url: {}'.format(webserver_url))
    data['run_id'] = _datetime.datetime.now().strftime(**"IMAOC_%d%m%Y%H%M%S%f"**)
    resp = map_iap_request(webserver_url,client_id,method = 'POST',json = data)
    print('response text:{}'.format(resp))

But still it's not resolved, and AIRFLOW_CTX_DAG_RUN_ID is coming as "manual__2020-07-31T20:18:43+00:00" format....

No idea what to do for remove this conflict and trigger the DAG, if the file coming on the same second.

please use the below code it working

client_id = os.getenv("CLIENT_ID")
# This should be part of your webserver's URL:
# {tenant-project-id}.appspot.com
webserver_id = os.getenv("TENANT_PROJECT")
# The name of the DAG you wish to trigger
dag_name = os.getenv("DAG_NAME")
webserver_url = (
    'https://'
    + webserver_id
    + '.appspot.com/api/experimental/dags/'
    + dag_name
    + '/dag_runs'
)
# Make a POST request to IAP which then Triggers the DAG
run_id = datetime.utcnow().strftime('alpaca_%Y-%m-%dT%H:%M:%S.%f')

conf = {"conf": data}
print(f"JSON body = {conf}")

make_iap_request(
    webserver_url, client_id, method='POST', json={"conf": data, "run_id": run_id, "replace_microseconds": False})

The above answer worked for me by adding "replace_microseconds": False in the conf file as shown below

run_id = 'trig__'+datetime.datetime.utcnow().isoformat()
conf['replace_microseconds'] = False
response = requests.post(URL, headers=Header, json={"conf": conf, "run_id": run_id})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM