简体   繁体   English

从树莓派 CSV 文件到 Bigquery 表的数据流

[英]Data streaming from raspberry pi CSV file to Bigquerry table

I have some CSV files generated by raspberry pi that needs to be pushed into bigquery tables.我有一些由树莓派生成的 CSV 文件需要推送到 bigquery 表中。 Currently, we have a python script using bigquery.LoadJobConfig for batch upload and I run it manually.目前,我们有一个 python 脚本,使用bigquery.LoadJobConfig进行批量上传,我手动运行它。 The goal is to have streaming data(or every 15 minutes) in a simple way.目标是以简单的方式获得流数据(或每 15 分钟)。

I explored different solutions:我探索了不同的解决方案:

  1. Using airflow to run the python script (high complexity and maintenance)使用 airflow 运行 python 脚本(高复杂度和维护)
  2. Dataflow (I am not familiar with it but if it does the job I will use it)数据流(我不熟悉它,但如果它能完成工作,我会使用它)
  3. Scheduling pipeline to run the script through GitLab CI (cron syntax: */15 * * * * )通过 GitLab CI 运行脚本的调度管道(cron 语法: */15 * * * * )

Could you please help me and suggest to me the best way to push CSV files into bigquery tables in real-time or every 15 minutes?您能否帮助我并向我建议将 CSV 文件实时或每 15 分钟推送到 bigquery 表中的最佳方法?

Good news, you have many options, Perhaps the easiest would be to automate the python script that you have currently.好消息,您有很多选择,也许最简单的方法是自动化您当前拥有的 python 脚本。 since it does what you need, Assuming you are running it manually on a local machine, you could upload it to a lightweight VM on Google Cloud, the use CRON on the VM to automate the running of it.因为它可以满足您的需求,假设您在本地机器上手动运行它,您可以将其上传到 Google Cloud 上的轻量级虚拟机,在虚拟机上使用 CRON 来自动运行它。 I used used this approach in the past and it worked well.我过去使用过这种方法,效果很好。

Another option would be to deploy your Python code to a Google Cloud Function, a way to let GCP run the code without you having to worry about maintaining the backend resource.另一种选择是将您的 Python 代码部署到 Google Cloud Function,这是一种让 GCP 运行代码而无需担心维护后端资源的方法。

Find out more about Cloud Functions here: https://cloud.google.com/functions在此处了解有关云功能的更多信息: https://cloud.google.com/functions

A third option, depending on where your.csv files are being generated, perhaps you could use the BigQuery Data Transfer service to handle the imports into BigQuery.第三个选项,取决于您的 .csv 文件的生成位置,也许您可以使用 BigQuery 数据传输服务来处理 BigQuery 的导入。

More on that here: https://cloud.google.com/bigquery/docs/dts-introduction更多信息: https://cloud.google.com/bigquery/docs/dts-introduction

Good luck!祝你好运!

Adding to @Ben's answer, you can also implement Cloud Composer to orchestrate this workflow.添加到@Ben 的答案中,您还可以实施 Cloud Composer 来编排此工作流程。 It is built on Apache Airflow and you can use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, Airflow scheduler etc without worrying about your infrastructure and maintenance. It is built on Apache Airflow and you can use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, Airflow scheduler etc without worrying about your infrastructure and maintenance.

You can implement DAGs to您可以实施 DAG 以

More on Cloud Composer更多关于云作曲家

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将数据从 Raspberry pi 4 发送到 firebase 防火墙 - sending data from Raspberry pi 4 to firebase firestore 使用 bigquery 中具有字符串列的另一个表更新 bigquery 中的结构列 - Update a struct column in bigquery using another table in bigquerry having string columns 从另一个 CSV 文件(Azure 数据工厂)向 CSV 文件添加列 - Add column to CSV File from another CSV File (Azure Data Factory) 无法在树莓派 4 上的 python 中导入云 Firestore - Fail to import cloud firestore in python on raspberry pi 4 使用树莓派实现后端是个好主意吗? - Is it a good idea to use a raspberry pi to implement a backend? 从 csv 文件覆盖和 append 大查询表 - Overwrite and append big query table from csv file 将时间戳数据从 CSV 插入时间戳数据类型的 Redshift 表列时出错 - Error when inserting timestamp data from CSV into a Redshift table column which is of timestamp data type 删除列并更新包含流数据的 BigQuery 表 - DROP column and update BigQuery table containing streaming data Go Mod 在Raspberry Pi4上下载 - Go Mod Download on Raspberry Pi4 在 Raspberry Pi 4 上运行 AWS CloudWatch 日志代理 - Running AWS CloudWatch logs agent on Raspberry Pi 4
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM