简体   繁体   中英

Google Cloud Dataflow (Python): function to join multiple files

I am new to Google cloud and know python to write few scripts, currently learning cloud functions and BiqQuery.

my question: I need to join a large CSV file with multiple lookup files and replace values from lookup files.

learnt that dataflow can be used to do ETL,but don't know how to write the code in Python.

can you please share your insights. Appreciate your help.

Rather than joining data in python, I suggest you separately extract and load the CSV and lookup data. Then run a BigQuery query that joins the data and writes the result to a permanent table . You can then delete the separately import data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM