[英]Is it possible for Airflow to manage UDF creation in BigQuery?
I use Airflow for various ETL work, but I've also started using UDFs heavily.我将 Airflow 用于各种 ETL 工作,但我也开始大量使用UDF 。
I'd like to organize my UDFs in a dataset my_project.my_udfs
, and I was hoping to be able to utilize Airflow for this purpose.我想在数据集
my_project.my_udfs
中组织我的 UDF,我希望能够为此目的利用 Airflow。 Is there a way to do so?有办法吗?
I ultimately want to be able to schedule queries like this, simple example:我最终希望能够像这样安排查询,简单的例子:
CREATE FUNCTION `my_project.my_udfs.normalize`(s STRING)
RETURNS STRING
AS TRIM(LOWER(s));
A couple of answers to questions you may be thinking of:您可能会想到的问题的几个答案:
Thanks in advance!提前致谢!
If you're storing your UDF inside BigQuery, you can use a BigQuery hook and pass in some basic SQL to execute it.如果您将 UDF 存储在 BigQuery 中,则可以使用 BigQuery 挂钩并传入一些基本的 SQL 来执行它。
from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook
bq_hook = BigQueryHook(gcp_conn_id)
results = bq_hook.get_records('select * from my_table')
Replace my_table
with your UDF, that should return a result set for you.将
my_table
替换为您的 UDF,它应该会为您返回一个结果集。
Alternatively if you don't have persistent UDFs or want to pass something in each time, you could store some SQL in an XML file that lives next to your Python code and grab it from there when you want to execute it.或者,如果您没有持久性 UDF 或希望每次都传递一些内容,您可以将一些 SQL 存储在 XML 文件中,该文件位于您的 Python 代码旁边,并在您想要执行它时从那里获取它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.