简体   繁体   中英

Is it possible to run a custom python script in Apache beam or google cloud dataflow

I want to run one of my python scripts using GCP. I am fairly new to GCP so I don't have a lot of idea.

My python script grabs data from BigQuery and perform these tasks

Several data processing operations

Build a ML model using KDTree and few clustering algorithms

Dumping the final result to a Big Query table.

This script needs to run every night.

So far I know I can use VMs, Cloud Run, Cloud function ( not a good option for me as it will take about an hour to finish everything). What should be the best choice for me to run this?

I came across Dataflow, but I am curious to know if it's possible to run a custom python script that can do all these things in google cloud dataflow (assuming I will have to convert everything into map-reduce format that doesn't seem easy with my code especially the ML part)?

Do you just need a python script to run on a single instance for a couple hours and then terminate?

You could setup a 'basic scaling' app-engine micro-service within your GCP project. The max run-time for taskqueue tasks is 24 hours when using 'basic scaling'.

Requests can run for up to 24 hours. A basic-scaled instance can choose to handle /_ah/start and execute a program or script for many hours without returning an HTTP response code. Task queue tasks can run up to 24 hours.

https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM