简体   繁体   中英

Can I use google DataFlow with native python?

I'm trying to build a python ETL pipeline in google cloud, and google cloud dataflow seemed a good option. When I explored the documentation and the developer guides, I see that the apache beam is always attached to dataflow as it's based on it. I may find issues processing my dataframes in apache beam.

My questions are:

  • if I want to build my ETL script in native python with DataFlow is that possible? Or it's necessary to use apache beam for my ETL?
  • If DataFlow was built just for the purpose of using Apache Beam? Is there any serverless google cloud tool for building python ETL (Google cloud function has 9 minutes time execution, that may cause some issues for my pipeline, I want to avoid in execution limit)

My pipeline aims to read data from BigQuery process it and re save it in a bigquery table. I may use some external APIs inside my script.

Concerning your first question, it looks like Dataflow was primarly written for using it along the Apache SDK, as can be checked in the official Google Cloud Documentation on Dataflow . So, it is possible that's actually a requirement to use Apache Beam for your ETL.

Regarding your second question, this tutorial gives you a guidance on how to build your own ETL Pipeline with Python and Google Cloud Platform functions, which are actually serverless. Could you please confirm if this link has helped you?

Regarding your first question, Dataflow needs to use Apache Beam. In fact, before Apache Beam there was something called Dataflow SDK, which was Google proprietary and then it was open sourced to Apache Beam.

The Python Beam SDK is rather easy once you put a bit of effort into it, and the main process operations you'd need are very close to native Python language.

If your end goal is to read, process and write to BQ, I'd say Beam + Dataflow is a good match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM