[英]Data streaming from raspberry pi CSV file to Bigquerry table
I have some CSV files generated by raspberry pi that needs to be pushed into bigquery tables.我有一些由树莓派生成的 CSV 文件需要推送到 bigquery 表中。 Currently, we have a python script using bigquery.LoadJobConfig for batch upload and I run it manually.
目前,我们有一个 python 脚本,使用bigquery.LoadJobConfig进行批量上传,我手动运行它。 The goal is to have streaming data(or every 15 minutes) in a simple way.
目标是以简单的方式获得流数据(或每 15 分钟)。
I explored different solutions:我探索了不同的解决方案:
Could you please help me and suggest to me the best way to push CSV files into bigquery tables in real-time or every 15 minutes?您能否帮助我并向我建议将 CSV 文件实时或每 15 分钟推送到 bigquery 表中的最佳方法?
Good news, you have many options, Perhaps the easiest would be to automate the python script that you have currently.好消息,您有很多选择,也许最简单的方法是自动化您当前拥有的 python 脚本。 since it does what you need, Assuming you are running it manually on a local machine, you could upload it to a lightweight VM on Google Cloud, the use CRON on the VM to automate the running of it.
因为它可以满足您的需求,假设您在本地机器上手动运行它,您可以将其上传到 Google Cloud 上的轻量级虚拟机,在虚拟机上使用 CRON 来自动运行它。 I used used this approach in the past and it worked well.
我过去使用过这种方法,效果很好。
Another option would be to deploy your Python code to a Google Cloud Function, a way to let GCP run the code without you having to worry about maintaining the backend resource.另一种选择是将您的 Python 代码部署到 Google Cloud Function,这是一种让 GCP 运行代码而无需担心维护后端资源的方法。
Find out more about Cloud Functions here: https://cloud.google.com/functions在此处了解有关云功能的更多信息: https://cloud.google.com/functions
A third option, depending on where your.csv files are being generated, perhaps you could use the BigQuery Data Transfer service to handle the imports into BigQuery.第三个选项,取决于您的 .csv 文件的生成位置,也许您可以使用 BigQuery 数据传输服务来处理 BigQuery 的导入。
More on that here: https://cloud.google.com/bigquery/docs/dts-introduction更多信息: https://cloud.google.com/bigquery/docs/dts-introduction
Good luck!祝你好运!
Adding to @Ben's answer, you can also implement Cloud Composer to orchestrate this workflow.添加到@Ben 的答案中,您还可以实施 Cloud Composer 来编排此工作流程。 It is built on Apache Airflow and you can use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, Airflow scheduler etc without worrying about your infrastructure and maintenance.
It is built on Apache Airflow and you can use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, Airflow scheduler etc without worrying about your infrastructure and maintenance.
You can implement DAGs to您可以实施 DAG 以
GCSToBigQueryOperator
GCSToBigQueryOperator
More on Cloud Composer更多关于云作曲家
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.