简体   繁体   中英

How do I automatically save my scraped data to my Database

I'm using Django 1.5, Python 2.7 on windows 7. I have the following view that extracts and display links from various sources. It works ok. But I don't know how to:

  1. save the data to the database and
  2. sort it by date extracted.

PS: I have a similar question question here but it looks like I wasn't clear on that enough:

Python/Django Extract and append only new links

I hope someone could help me here.

views.py:

def foo():
    site = "http://www.example.com/portal/jobs"
    hdr = {'User-Agent' : 'Mozilla/5.0'}
    req = urllib2.Request(site, headers=hdr)
    jobpass = urllib2.urlopen(req)
    soup = BeautifulSoup(jobpass)
    for tag in soup.find_all('a', href = True):
        tag['href'] = urlparse.urljoin('http://www.businessghana.com/portal/',     tag['href'])
    return map(str, soup.find_all('a', href = re.compile('.getJobInfo')))

def example():
    site = "http://example.com"
    hdr = {'User-Agent' : 'Mozilla/5.0'}
    req = urllib2.Request(site, headers=hdr)
    jobpass = urllib2.urlopen(req)
    soup = BeautifulSoup(jobpass)
    return map(str, soup.find_all('a', href = re.compile('.display-job')))

 foo_links = foo()
 example_links = example()

 def all_links():
     return (foo_links + example_links)

 def display_links(request):
    name = all_links()
    paginator = Paginator(name, 25)
    page = request.GET.get('page')
    try:
        name = paginator.page(page)
    except PageNotAnInteger:
        name = paginator.page(1)
    except EmptyPage:
        name = paginator.page(paginator.num_pages)

    return render_to_response('jobs.html', {'name' : name}) 

my template looks like this:

<ol>
{% for link in name %}
  <li> {{ link|safe }}</li>
{% endfor %}
 </ol>
 <div class="pagination">
<span class= "step-links">
    {% if name.has_previous %}
        <a href="?page={{ names.previous_page_number }}">Previous</a>
    {% endif %}

    <span class = "current">
        Page {{ name.number }} of {{ name.paginator.num_pages}}.
    </span>

    {% if name.has_next %}
        <a href="?page={{ name.next_page_number}}">next</a>
    {% endif %}
</span>
</div>

My model looks like this:

from django.db import models

class jobLinks(models.Model):
    links = models.URLField()
    pub_date = models.DateTimeField('date retrieved')

    def __unicode__(self):
        return self.links

It is my first programming project and I can't get this part to work no matter what I tried/searched Any help will be greatly appreciated.

Thanks

I haven't used Python with Django templates in a long time, but this looks like a typical MVC related project. In any case, you will need to connect to a database so you will need to install some kind of Python compatible database module. MySQLdb I believe has a Python implementation. Then you will need to serialize the data you get back from your web requests and use some sort of persistence API. I believe the pickle module deals with Python persistence. Persistence is more or less saving the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM