简体   繁体   中英

How do I run daily automated tests for a web scraper?

I maintain a REST API built with Django REST that, internally, scrapes several webpages to retrieve a bunch of information.

I have tests for every endpoint that check whether the scrapers are still working. They actually connect with the pages and check that the sources remain unchanged and that everything is still fine basically.

I would like to run these tests several times per day, and be notified when any of these scrapers fail. I'm not sure how should I approach this.

I'm looking for a method that allows me to:

  1. Run tests automatically every X hours
  2. Notify me of the results

I've been looking at CI, but I'm not sure if that is the preferable approach here.

If you already have a script the does the testing and notifies you (for example via email or Pushover, the implementation in Python is not too hard), set up a cron job:

Open a terminal and type crontab -e , select your editor if you are asked and add the following line:

* */X    * * *   python /path/to/testscript.py

where X should be replaced by how often you want to test. For example, if you write 3 instead of X, every 3 hours your script will be executed.

You can try django-crontab .

  • Easy to configure.
  • Manage the crons in respective app directories.
  • Configurable via settings module.
  • Integrates with manage.py to easily add, remove and show crons.

NOTE : Works for Django >= 1.8


EDIT

Example Configuration & Usage :

  • Install via pip : pip install django-crontab
  • Add in INSTALLED_APPS in settings module.
    INSTALLED_APPS = ( 'django_crontab', ... )
  • Create a module named crons.py (you can name it anything) in the project directory path.

crons.py (example path /path/to/project/<app_name>/crons.py ) :

from .models import FooModel

def foo_scheduler():
    # update Foo's bar on each invocation
    foos = FooModel.objects.all()
    for foo in foos:
        foo.bar += 42
        foo.save()
  • Register the cron in settings .

settings.py :

...

CRONJOBS = (
    # this will recur every 5 mins
    ('*/5 * * * *', '<app_name>.crons.foo_scheduler'),
)

...
  • Finally, add to the crontab using python manage.py crontab add .
  • Remove using python manage.py crontab remove .
  • To list all active jobs, python manage.py crontab show .

Refer to additional configuration options here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM