I have a simple script, it retrieves data from an API and loads it into BigQuery. I was using Cloud Functions and it was running smoothly, however there came a time that reached the 9-minute run time limit.
What is the best way to do this in GCP, taking the time the script needs? I was thinking of creating another Cloud Function that daily starts a preemtible VM, the VM executes the script and in the end turns itself off. To keep the price low, the VM would always shut down at the end of the data load. It would start the next day at the selected time.
I don't know where to start to do this, but I was wondering if that would be the best way.
Cloud functions aren't really suited to batch jobs that may be longer running than 10 minutes. I'd suggest running your job using a Compute Engine VM and scheduling it with a combination of Cloud functions / Cloud scheduler.
Here's a rough outline:
import googleapiclient.discovery
def start_job(event, context):
"""Triggered from a message on a Cloud Pub/Sub topic.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
compute = googleapiclient.discovery.build('compute', 'v1')
compute.instances().insert(
project='project_id',
zone='us-east1-b',
body=vm_config).execute()
This lets you avoid the cost of an always-on VM. See this blog post for more detail.
Could this work?
import schedule
import time
def run_daily():
do something
do something else
schedule.every().day.at("08:20:30").do(run_daily) # HH MM SS
while True:
schedule.run_pending()
time.sleep(1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.