简体繁体中英

Web Scraping with Google Compute Engine / App Engine

原文 2015-02-23 18:48:17 9 1 python/ google-app-engine/ cron/ web-scraping/ google-compute-engine

I've written a python script that uses Selenium to scrape information from a website and stores it in a csv file. It works well on my local machine when I manually execute it but I now want to run the script automatically once per hour for several weeks and safe the data in a database. It may take about 5-10 minutes to run the script.

I've just started off with Google Cloud and it looks like there are several ways of implementing it with either Compute Engine or App Engine. So far, I get stuck at a certain point with all three ways that I found so far (eg getting the scheduled task call a URL of my backend instance and getting that instance to kick off the script). I've tried to:

Execute the script via Compute Engine and use datastore or cloud sql. Unclear if crontab can easily be set up.
Use Task Queues and Scheduled Tasks on App Engine.
Use backend instance and Scheduled Tasks on App Engine.

I'd be curious to hear from others what they would recommend as the easiest and most appropriate way given that this is truly a backend script that does not need a user front end.

1 answers

App Engine is feasible but only if you limit your use of Selenium to a .remote out to a site such as http://crossbrowsertesting.com/ -- feasible but messy.

I'd use Compute Engine -- and cron is trivial to use on any Linux image, see eg http://www.thegeekstuff.com/2009/06/15-practical-crontab-examples/ !

Using Google App Engine to update files on Google Compute Engine

How to migrate Google App Engine Project to Compute Engine completely?

Web Services with Google App Engine

Google Compute Engine OpenERP

Google compute engine example

Web/Screen Scraping with Google App Engine - Code works in python interpreter but not GAE

How can i run chromium web driver on google compute engine?

Passing Data from App Engine to Compute Engine

Google Compute Engine firebase is not a module

python multiprocessing google compute engine

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Using Google App Engine to update files on Google Compute Engine How to migrate Google App Engine Project to Compute Engine completely? Web Services with Google App Engine Google Compute Engine OpenERP Google compute engine example Web/Screen Scraping with Google App Engine - Code works in python interpreter but not GAE How can i run chromium web driver on google compute engine? Passing Data from App Engine to Compute Engine Google Compute Engine firebase is not a module python multiprocessing google compute engine

Related Tags

Web Scraping with Google Compute Engine / App Engine

Question

1 answers

solution1 2 2015-02-24 05:26:23

solution1
2 2015-02-24 05:26:23