简体   繁体   English

Airflow 是否可以在 BigQuery 中管理 UDF 创建?

[英]Is it possible for Airflow to manage UDF creation in BigQuery?

I use Airflow for various ETL work, but I've also started using UDFs heavily.我将 Airflow 用于各种 ETL 工作,但我也开始大量使用UDF

I'd like to organize my UDFs in a dataset my_project.my_udfs , and I was hoping to be able to utilize Airflow for this purpose.我想在数据集my_project.my_udfs中组织我的 UDF,我希望能够为此目的利用 Airflow。 Is there a way to do so?有办法吗?

I ultimately want to be able to schedule queries like this, simple example:我最终希望能够像这样安排查询,简单的例子:

CREATE FUNCTION `my_project.my_udfs.normalize`(s STRING)
  RETURNS STRING
  AS TRIM(LOWER(s));

A couple of answers to questions you may be thinking of:您可能会想到的问题的几个答案:

  1. I'm part of a broader organization that uses Airflow, and the main benefit I want to leverage here is to have source control over these functions.我是使用 Airflow 的更广泛组织的一部分,我想在这里利用的主要好处是对这些功能进行源代码控制。
  2. The example is not such a case, but many of these functions are ones that will be updated periodically (monthly/quarterly).该示例不是这种情况,但其中许多功能是会定期(每月/每季度)更新的功能。

Thanks in advance!提前致谢!

If you're storing your UDF inside BigQuery, you can use a BigQuery hook and pass in some basic SQL to execute it.如果您将 UDF 存储在 BigQuery 中,则可以使用 BigQuery 挂钩并传入一些基本的 SQL 来执行它。

BigQuery Hooks BigQuery 挂钩

from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook

bq_hook = BigQueryHook(gcp_conn_id)
results = bq_hook.get_records('select * from my_table')

Replace my_table with your UDF, that should return a result set for you.my_table替换为您的 UDF,它应该会为您返回一个结果集。

Alternatively if you don't have persistent UDFs or want to pass something in each time, you could store some SQL in an XML file that lives next to your Python code and grab it from there when you want to execute it.或者,如果您没有持久性 UDF 或希望每次都传递一些内容,您可以将一些 SQL 存储在 XML 文件中,该文件位于您的 Python 代码旁边,并在您想要执行它时从那里获取它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM