简体   繁体   中英

Extend BigQueryExecuteQueryOperator with additional labels using jinja2

In order to track GCP costs using labels , would like to extend BigQueryExecuteQueryOperator with some additional labels so that each task instance gets these labels automatically set in its constructor.

class ExtendedBigQueryExecuteQueryOperator(BigQueryExecuteQueryOperator):

    @apply_defaults
    def __init__(self,
                 *args,
                 **kwargs) -> None:
        task_labels = {
            'dag_id': '{{ dag.dag_id }}',
            'task_id': kwargs.get('task_id'),
            'ds': '{{ ds }}',
            # ugly, all three params got in diff. ways
        }
        super().__init__(*args, **kwargs)
        if self.labels is None:
            self.labels = task_labels
        else:
            self.labels.update(task_labels)

with DAG(dag_id=...,
         start_date=...,
         schedule_interval=...,
         default_args=...) as dag:

    t1 = ExtendedBigQueryExecuteQueryOperator(
        task_id=f't1',
        sql=f'SELECT 1;',
        labels={'some_additional_label2':'some_additional_label2'}
        # all labels should be: dag_id, task_id, ds, some_additional_label2
    )

    t2 = ExtendedBigQueryExecuteQueryOperator(
        task_id=f't2',
        sql=f'SELECT 2;',
        labels={'some_additional_label3':'some_additional_label3'}
        # all labels should be: dag_id, task_id, ds, some_additional_label3
    )

    t1 >> t2

but then I lose task level labels some_additional_label2 or some_additional_label3 .

You could create the following policy in airflow_local_settings.py :

def policy(task):
    if task.__class__.__name__ == "BigQueryExecuteQueryOperator":
        task.labels.update({'dag_id': task.dag_id, 'task_id': task.task_id})

From docs:

Your local Airflow settings file can define a policy function that has the ability to mutate task attributes based on other task or DAG attributes. It receives a single argument as a reference to task objects, and is expected to alter its attributes.

More details on applying Policy: https://airflow.readthedocs.io/en/1.10.9/concepts.html#cluster-policy

You won't need to extend BigQueryExecuteQueryOperator in that case. The only missing part is execution_date which you can set in the task itself.

Example:

with DAG(dag_id=...,
         start_date=...,
         schedule_interval=...,
         default_args=...) as dag:

    t1 = BigQueryExecuteQueryOperator(
        task_id=f't1',
        sql=f'SELECT 1;',
        lables={'some_additional_label2':'some_additional_label2', 'ds': '{{ ds }}'}
    )

airflow_local_settings file needs to be on your PYTHONPATH . You can put in under $AIRFLOW_HOME/config or inside your dags directory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM