[英]Can I use google DataFlow with native python?
I'm trying to build a python ETL pipeline in google cloud, and google cloud dataflow seemed a good option.我正在尝试在谷歌云中构建一个 python ETL 管道,而谷歌云数据流似乎是一个不错的选择。 When I explored the documentation and the developer guides, I see that the apache beam is always attached to dataflow as it's based on it.当我浏览文档和开发人员指南时,我看到 apache beam 始终附加到数据流,因为它基于它。 I may find issues processing my dataframes in apache beam.我可能会在 apache beam 中发现处理我的数据帧的问题。
My questions are:我的问题是:
My pipeline aims to read data from BigQuery process it and re save it in a bigquery table.我的管道旨在从 BigQuery 处理它读取数据并将其重新保存在一个 bigquery 表中。 I may use some external APIs inside my script.我可能会在我的脚本中使用一些外部 API。
Concerning your first question, it looks like Dataflow was primarly written for using it along the Apache SDK, as can be checked in the official Google Cloud Documentation on Dataflow .关于你的第一个问题,看起来数据流主要是为了在 Apache SDK 中使用它而编写的,可以在数据流的官方谷歌云文档中查看。 So, it is possible that's actually a requirement to use Apache Beam for your ETL.因此,实际上可能需要为您的 ETL 使用 Apache Beam。
Regarding your second question, this tutorial gives you a guidance on how to build your own ETL Pipeline with Python and Google Cloud Platform functions, which are actually serverless.关于您的第二个问题, 本教程将指导您如何使用 Python 和 Google Cloud Platform 函数构建自己的 ETL 管道,这些管道实际上是无服务器的。 Could you please confirm if this link has helped you?您能否确认此链接是否对您有所帮助?
Regarding your first question, Dataflow needs to use Apache Beam.关于你的第一个问题,Dataflow需要使用Apache Beam。 In fact, before Apache Beam there was something called Dataflow SDK, which was Google proprietary and then it was open sourced to Apache Beam.事实上,在 Apache Beam 之前,有一个叫做 Dataflow SDK 的东西,它是 Google 专有的,然后开源给 Apache Beam。
The Python Beam SDK is rather easy once you put a bit of effort into it, and the main process operations you'd need are very close to native Python language. Python Beam SDK 是相当容易的,一旦你付出了一些努力,你需要的主要流程操作非常接近原生 Python 语言。
If your end goal is to read, process and write to BQ, I'd say Beam + Dataflow is a good match.如果您的最终目标是读取、处理和写入 BQ,我会说 Beam + Dataflow 是一个很好的搭配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.