[英]Extend BigQueryExecuteQueryOperator with additional labels using jinja2
In order to track GCP costs using labels , would like to extend BigQueryExecuteQueryOperator with some additional labels so that each task instance gets these labels automatically set in its constructor.为了使用标签跟踪 GCP 成本,希望使用一些额外的标签扩展BigQueryExecuteQueryOperator ,以便每个任务实例在其构造函数中自动设置这些标签。
class ExtendedBigQueryExecuteQueryOperator(BigQueryExecuteQueryOperator):
@apply_defaults
def __init__(self,
*args,
**kwargs) -> None:
task_labels = {
'dag_id': '{{ dag.dag_id }}',
'task_id': kwargs.get('task_id'),
'ds': '{{ ds }}',
# ugly, all three params got in diff. ways
}
super().__init__(*args, **kwargs)
if self.labels is None:
self.labels = task_labels
else:
self.labels.update(task_labels)
with DAG(dag_id=...,
start_date=...,
schedule_interval=...,
default_args=...) as dag:
t1 = ExtendedBigQueryExecuteQueryOperator(
task_id=f't1',
sql=f'SELECT 1;',
labels={'some_additional_label2':'some_additional_label2'}
# all labels should be: dag_id, task_id, ds, some_additional_label2
)
t2 = ExtendedBigQueryExecuteQueryOperator(
task_id=f't2',
sql=f'SELECT 2;',
labels={'some_additional_label3':'some_additional_label3'}
# all labels should be: dag_id, task_id, ds, some_additional_label3
)
t1 >> t2
but then I lose task level labels some_additional_label2
or some_additional_label3
.但后来我丢失了任务级别标签some_additional_label2
或some_additional_label3
。
You could create the following policy in airflow_local_settings.py
:您可以在airflow_local_settings.py
中创建以下策略:
def policy(task):
if task.__class__.__name__ == "BigQueryExecuteQueryOperator":
task.labels.update({'dag_id': task.dag_id, 'task_id': task.task_id})
From docs:来自文档:
Your local Airflow settings file can define a policy function that has the ability to mutate task attributes based on other task or DAG attributes.您的本地 Airflow 设置文件可以定义策略 function,该策略能够根据其他任务或 DAG 属性改变任务属性。 It receives a single argument as a reference to task objects, and is expected to alter its attributes.它接收一个参数作为对任务对象的引用,并期望改变它的属性。
More details on applying Policy: https://airflow.readthedocs.io/en/1.10.9/concepts.html#cluster-policy有关应用策略的更多详细信息: https://airflow.readthedocs.io/en/1.10.9/concepts.html#cluster-policy
You won't need to extend BigQueryExecuteQueryOperator in that case.在这种情况下,您不需要扩展 BigQueryExecuteQueryOperator。 The only missing part is execution_date which you can set in the task itself.唯一缺少的部分是您可以在任务本身中设置的execution_date 。
Example:例子:
with DAG(dag_id=...,
start_date=...,
schedule_interval=...,
default_args=...) as dag:
t1 = BigQueryExecuteQueryOperator(
task_id=f't1',
sql=f'SELECT 1;',
lables={'some_additional_label2':'some_additional_label2', 'ds': '{{ ds }}'}
)
airflow_local_settings
file needs to be on your PYTHONPATH . airflow_local_settings
文件需要在您的PYTHONPATH上。 You can put in under $AIRFLOW_HOME/config
or inside your dags directory.您可以放在$AIRFLOW_HOME/config
下或 dags 目录中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.