简体   繁体   中英

How to define a view function in views.py file in django

I want to create a view function inside views.py file, which run on a particular interval of time, without depending on a request object is that possible in django I am doing a simple project on crawling the web data using bs4, request and django, up to now, I am able on crawling the data and present it to my django views.py .

Crawled data from the different website follow the following format

news_title = 'were-these-remote-wild-islands'
news_url = 'http://bbc.co.uk/travel/see-the-dark-side-of-climate-change'

and my view function has the following line of code

from .bbc import bbc_crawler
from .models import News

def collect_data(request):
    '''
    aggregrate all the news from each
    news portal
    '''


    allnews = []
    #return dict obj {'title':'climate change', 'url':'http://bbc.co.uk'}, {'title':'t', 'url':'http://url.com'}
    allnews.append(bbc_crawler()) 

    for news in allnews:
        for eachnews,link in news.items():
            #Problem is for every request the same data pushed to the database, need a solution to push the data after every 5 minutes, without depending on this function

            News.objects.create(title=eachnews, url=link, source=source)

    return render(request, 'news/index.html', {'allnews':allnews, 'source': source})

The problem with the above code is, the above view function only run when we visit the url which point to this view function as defined in this urls.py file

urls.py

from django.conf.urls import url
from . import views

urlpatterns = [
    url(r'^$', views.news, name="index"),
]

When I refresh that url, each time same duplicated data got stored in the database.

I want the solution of running the crawler in each 5 minutes and save the crawled data into database.

Where do I run the crawler in views.py file, so that I can save the data in each 5 minutes without duplicating data and without depending on the request objects. I want to save the crawled data in django database in every 5 minutes,

How to accomplish this, the current problem is the data is saved only when we refresh or request the page.

I to save the data without depending upon the request object in database

Views are for answering the request. if you need to do some crawling on interval you should configure a celery task as suggested by @Rohit Jain -or for trivial stuff- in a management command called from cron or supervisor , save the data crawled in the database, and grab it from the view.

First I want you rethink your design

Views are made to give response to the user's request. Not for crawling data you should implement another function independent of view. Your view should only display the last entry in database. It should NOT crawl the data.

Think about a scenario: User1 sends a GET request it will crawl the data and save it to database at tick 00:01 assuming we solved your timely execution problem. Next action should be at tick 00:06. Now if in between User2 and User3 comes at tick 00:02 and 00:03 and they send GET request new crawled data will be added to the database. You supposed to have 2 entries between 00:01 to 00:06 but because of User2 and three there are 4 entries.

So do like this. This is more appropriate

1.Create a myfun.py into your application directory:

from .bbc import bbc_crawler
from .models import News

 def crawl_data():
  allnews = []
  allnews.append(bbc_crawler()) 
  for news in allnews:
   for eachnews,link in news.items():
    News.objects.create(title=eachnews, url=link, source=source)

2.AFTER starting your webserver explicitly run this crawling.py just once

 python crawling.py

Write crawling.py as follows:

import time
from myfun import crawl_data
while(True):
 time.sleep(300)
 crawl_data()         

Into your view just show the last entry in the database to any number of users:

def collect_data(request);
 lastentry=News.objects.all().last()
 allnews=lastentry.allnews #Fetch acoording to your model fields 
 source=lastentry.source   #Fetch acoording to your model fields 
 return render(request, 'news/index.html', {'allnews':allnews, 'source': source})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM