简体   繁体   English

定期安排从GCS向BigQuery加载数据

[英]Schedule loading data from GCS to BigQuery periodically

I've researched it and currently come up with a strategy using Apache Airflow. 我已经对其进行了研究,目前正在使用Apache Airflow提出一项策略。 I'm still not sure how to do this. 我仍然不确定如何执行此操作。 The most blogs and answers I'm getting are directly codes instead of some material to better understand it. 我得到的大多数博客和答案都是直接代码,而不是为了更好地理解它的一些材料。 Also, please suggest if there is a good way to do it. 另外,请提出是否有一个好的方法。

I also got an answer like using Background Cloud Function with a Cloud Storage trigger . 我也得到了一个答案,例如将Background Cloud FunctionCloud Storage trigger

You can use BigQuery's Cloud Storage transfers , but note that it's still in BETA. 您可以使用BigQuery的Cloud Storage传输 ,但请注意,该传输仍在BETA中。

It gives you the option to schedule transfers from Cloud Storage to BigQuery with certain limitations. 它使您可以选择在有一定限制的情况下安排从Cloud Storage到BigQuery的传输。

在此处输入图片说明

The most blogs and answers I'm getting are directly codes 我得到的最多博客和答案都是直接代码

Apache Airflow comes with a rich UI for many tasks but that doesn't mean you are not supposed to write code in order to get your task done. Apache Airflow带有丰富的UI,可以执行许多任务,但这并不意味着您不应编写代码来完成任务。

For your case, you need to use BigQuery command line operator for Apache Airflow 对于您的情况,您需要对Apache Airflow使用BigQuery 命令行运算符

在此处输入图片说明

A good way on how to do this can be found in this link 在此链接中可以找到有关如何执行此操作的好方法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM