简体   繁体   中英

BigQueryInsertJobOperator call Stored Procedure error

I'm having an error when trying to call an SP in BigQuery with Airflow BigQueryInsertJobOperator. I have tried many different syntax and it doesn't seem to be working. I pulled the same SQL out of the SP, put it into a file, and it ran fine.

Below is the code that I tried to execute the SP in BigQuery.

PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "project-id")
Dataset = os.environ.get("GCP_Dataset", "dataset")

with DAG(dag_id='dag_id',default_args=default_args,schedule_interval="@daily", start_date=days_ago(1), catchup=False

) as dag:

    Call_SP = BigQueryInsertJobOperator(
            task_id='Call_SP',
            configuration={
                "query": {
                    "query": "CALL `" + PROJECT_ID + "." + Dataset + "." + "SP`();",
                    #"query": "{% include 'Scripts/Script.sql' %}",
                    "useLegacySql": False,
                }
            }
        )
        
    Call_SP

I see the is outputing call I am expecting

[2022-06-30, 17:32:07 UTC] {bigquery.py:2247} INFO - Executing: {'query': {'query': 'CALL `project-id.dataset.SP`();', 'useLegacySql': False}}
[2022-06-30, 17:32:07 UTC] {bigquery.py:1560} INFO - Inserting job airflow_Call_SP_2022_06_30T16_47_13_748417_00_00_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[2022-06-30, 17:32:13 UTC] {taskinstance.py:1776} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/bigquery.py", line 2269, in execute
    table = job.to_api_repr()["configuration"]["query"]["destinationTable"]
KeyError: 'destinationTable'

It just doesn't make sense to me since my SP is just one statement merge

Expounding on @Elad Kalif's answer, I was able to update apache-airflow-providers-google to version 8.0.0 using this GCP documentation on How to install a package from PyPi :

在此处输入图像描述

gcloud composer environments update <your-environment> \
    --location <your-environment-location> \
     --update-pypi-package apache-airflow-providers-google>=8.1.0

After installation, using this code (derived from your given code), it yielded successful results:

import datetime
from airflow import models
from airflow import DAG
from airflow.operators import bash
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator


PROJECT_ID = "<your-proj-id>"
Dataset = "<your-dataset>"

YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)

default_args = {
    'owner': 'Composer Example',
    'depends_on_past': False,
    'email': [''],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': datetime.timedelta(minutes=5),
    'start_date': YESTERDAY,
}

with models.DAG(dag_id='dag_id',default_args=default_args,schedule_interval="@daily", catchup=False
) as dag:

    Call_SP = BigQueryInsertJobOperator(
            task_id='Call_SP',
            configuration={
                "query": {
                    "query": "CALL `" + PROJECT_ID + "." + Dataset + "." + "<your-sp>`();",
                    #"query": "{% include 'Scripts/Script.sql' %}",
                    "useLegacySql": False,
                }
            }
        )
        
    Call_SP

在此处输入图像描述

Logs:

*** Reading remote log from gs:///2022-07-04T01:02:28.122529+00:00/1.log.
[2022-07-04, 01:02:33 UTC] {taskinstance.py:1044} INFO - Dependencies all met for <TaskInstance: dag_id.Call_SP manual__2022-07-04T01:02:28.122529+00:00 [queued]>
[2022-07-04, 01:02:33 UTC] {taskinstance.py:1044} INFO - Dependencies all met for <TaskInstance: dag_id.Call_SP manual__2022-07-04T01:02:28.122529+00:00 [queued]>
[2022-07-04, 01:02:33 UTC] {taskinstance.py:1250} INFO - 
--------------------------------------------------------------------------------
[2022-07-04, 01:02:33 UTC] {taskinstance.py:1251} INFO - Starting attempt 1 of 2
[2022-07-04, 01:02:33 UTC] {taskinstance.py:1252} INFO - 
--------------------------------------------------------------------------------
[2022-07-04, 01:02:33 UTC] {taskinstance.py:1271} INFO - Executing <Task(BigQueryInsertJobOperator): Call_SP> on 2022-07-04 01:02:28.122529+00:00
[2022-07-04, 01:02:33 UTC] {standard_task_runner.py:52} INFO - Started process 2461 to run task
[2022-07-04, 01:02:33 UTC] {standard_task_runner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'dag_id', 'Call_SP', 'manual__2022-07-04T01:02:28.122529+00:00', '--job-id', '29', '--raw', '--subdir', 'DAGS_FOLDER/20220704.py', '--cfg-path', '/tmp/tmpjpm9jt5h', '--error-file', '/tmp/tmpnoplzt6r']
[2022-07-04, 01:02:33 UTC] {standard_task_runner.py:80} INFO - Job 29: Subtask Call_SP
[2022-07-04, 01:02:34 UTC] {task_command.py:298} INFO - Running <TaskInstance: dag_id.Call_SP manual__2022-07-04T01:02:28.122529+00:00 [running]> on host airflow-worker-r66xj
[2022-07-04, 01:02:34 UTC] {taskinstance.py:1448} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_EMAIL=
AIRFLOW_CTX_DAG_OWNER=Composer Example
AIRFLOW_CTX_DAG_ID=dag_id
AIRFLOW_CTX_TASK_ID=Call_SP
AIRFLOW_CTX_EXECUTION_DATE=2022-07-04T01:02:28.122529+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-07-04T01:02:28.122529+00:00
[2022-07-04, 01:02:34 UTC] {bigquery.py:2243} INFO - Executing: {'query': {'query': 'CALL ``();', 'useLegacySql': False}}
[2022-07-04, 01:02:34 UTC] {credentials_provider.py:324} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
[2022-07-04, 01:02:35 UTC] {bigquery.py:1562} INFO - Inserting job airflow_dag_id_Call_SP_2022_07_04T01_02_28_122529_00_00_d8681b858989cd5f36b8b9f4942a96a0
[2022-07-04, 01:02:36 UTC] {taskinstance.py:1279} INFO - Marking task as SUCCESS. dag_id=dag_id, task_id=Call_SP, execution_date=20220704T010228, start_date=20220704T010233, end_date=20220704T010236
[2022-07-04, 01:02:37 UTC] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-07-04, 01:02:37 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check

Project History in Bigquery: 在此处输入图像描述

You hit a known bug in apache-airflow-providers-google which was fixed in PR and released in apache-airflow-providers-google version 8.0.0

To solve your issue you should upgrade the google provider version

If for some reason you can't upgrade the provider then you can create a custom operator from the code PR and use it until you are able to upgrade the provider version.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM