I have made a scraper to scrape some links from web page and want to run this scraper every 1 hours which resides in django app, but django it is impossible to run a scraper every 1 hours because the django views depends on the request response object. to solve this problem I have decided to use a python library named celery and according to the documentation I have write celery.py and tasks.py files
By django project structure is like this
newsportal
- newsportal
-settings.py
-celery.py
__init__.py
- news
-tasks.py
-views.py
-models.py
celery.py
has the following code
from __future__ import absolute_import
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'newsportal.settings')
from django.conf import settings # noqa
app = Celery('newsportal')
# Using a string here means the worker will not have to
# pickle the object when using Windows.
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
@app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
__init__.py
file has the following lines of code
from __future__ import absolute_import
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app # noqa
while as tasks.py
has the following lines of code
from __future__ import absolute_import
from celery import shared_task
from crawler import crawler
from .models import News
@shared_task
def news():
'''
scrape all links
'''
news = [] #store dict object
allnews.append(crawler())
for news_dict in allnews:
for news, url in news_dict.items():
#Save all the scrape news in database
News.objects.create(title=news, url=url, source=source)
what I want to do is to run the above news() function every 1 hours and save the result to the database.
I want to save the result of the tasks to the django database, how can I achive this.
according to the celery docs, to save the result given by the worker we need install django-celery==3.1.17
, as I have already installed, and do migration.
For the database backend in celery according to celery docs, we should put
app.conf.update(
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend',
)
line of code on settings.py file, on putting this of code in `settings.py` file I got the error of
settings.py", line 141, in <module>
app.conf.update(
NameError: name 'app' is not defined
as I have already Import and put the following line of code in settings.py
file as below
from __future__ import absolute_import
BROKER_URL = 'redis://localhost'
The main thing I want to do is,
Are there any other alternatives way to accomplish this task
I believe you would use app.conf.update(...)
in your celery.py
if you wanted to add that configuration there.
Your app.config_from_object('django.conf:settings')
call in celery.py
indicates that you're loading the configuration settings from your settings.py
file though.
So you should just be able to put CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'
at the end of your settings.py
file instead.
This should prevent you from getting that error.
I know this is a little late however I can highly recommend the Django Celery Result package found here .
Installation is straight forward and the package is recommended by Celery itself. Simply return some output from your task and it will be stored in the database and accessible under the Django admin.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.