简体   繁体   中英

which GCP component to use to fetch data from an API

I'm a little bit confused between gcp components, here is my use case:
daily, I need to fetch data from an external API (the API return json data), store it in GCS then load it in Bigquery,
I already created the python script fetching the data and store it in GCS and i'm confused which component to use for deployment:

  • Cloud run: from the doc it is used for deploying services, so I think its a bad choose
  • Cloud function: I think it works, but it is used for even based processing (through single purpose function...)
  • composer:(I'll use composer to orchestrate tasks, such as preprocessing of files in GCS, load them to BQ, transfert them to an archive Bucket) through kube.netesPodOperator, create a task that trigger the script to get the data
  • compute engine: I don't think that its the best chose since there are better ones
  • app engine: also I don't think it a good idea since it is used to deploy and scale web app...

(correcte me if i'm wrong in what I said, )
so my question is: what is the GCP component used for this kind of task

  • Cloud run: from the doc it is used for deploying services
  • app engine: also I don't think it a good idea since it is used to deploy and scale web app...

I think you've misunderstood. Both Cloud run and Google App Engine (GAE) are serverless offerings from Google Cloud. You deploy your code to any of them and you can invoke their urls which in turn will cause your code to execute and do stuff like go fetch data from somewhere and save it somewhere.

Google App Engine has a shorter timeout than Cloud Run (can't remember if Cloud Run has time out). So, if your code will take a long time to run, you don't want to use Google App Engine (unless you make it a background task) and if you don't need a UI, then you don't need GAE.

For your specific scenario, you can deploy your code to Cloud Run and use Cloud Scheduler to schedule it to be invoked at specific times. We have that architecture running in a similar scenario (we have a task that runs once daily; it's deployed to Cloud Run; Google Scheduler invokes the endpoint, it runs and saves data to datastore linked to an App Engine App). We wrote a blog article on deploying to Cloud Run and another on securing your cloud run (based off our experience in the earlier described scenario)

GAE Timeout:

Every request to a Google App Engine (Standard) must complete within 1 - 10 minutes for automatic scaling and up to 24 hours for basic scaling (see documentation ). For Google App Engine Flexible, the timeout is 60 minutes ( documentation ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM