粗糙的蜘蛛不写在Postgres

Question

I'm trying to scrap items from a several pages of a website to a postgres database. 我正在尝试将网站的多个页面中的项目剪贴到Postgres数据库中。 I tried different codes but still doesn't work, my database still empty... 我尝试了不同的代码，但仍然无法正常工作，我的数据库仍然为空...

How can I scrap items from pages of a website to my Postgres database ? 如何将网站页面中的项目剪贴到Postgres数据库中？ What is wrong with my code ? 我的代码有什么问题？

I show you the latest version of the code : 我向您展示代码的最新版本：

Myspider.py Myspider.py

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import scrapy, os, re, csv
from scrapy.spiders import CrawlSpider, Rule, Spider
from scrapy.linkextractors import LinkExtractor
from scrapy.selector import Selector
from scrapy.loader import ItemLoader
from scrapy.loader.processors import Join, MapCompose
from scrapy.item import Item, Field
from AHOTU_V2.items import AhotuV2Item 

def url_lister():
    url_list = []
    page_count = 0
    while page_count < 10: 
        url = 'https://marathons.ahotu.fr/calendrier/?page=%s' %page_count
        url_list.append(url)
        page_count += 1 
    return url_list

class ListeCourse(CrawlSpider):
    name = 'ListeCAP_Marathons_ahotu' 
    start_urls = url_lister()

    deals_list_xpath='//div[@class="list-group col-sm-12 top-buffer"]/a[@class="list-group-item calendar"]' 

    item_fields = AhotuV2Item()

    item_fields = {
        'nom_course': './/dl/dd[3]/text()',
        'localisation' :'.//dl/dd[2]/span[1]/text()',
    }


    def parse_item(self, response):
        selector = Selector(response)

        # iterate over deals
        for deal in selector.xpath(self.deals_list_xpath):
            loader = ItemLoader(AhotuV2Item(), selector=deal)

            # define processors
            loader.default_input_processor = MapCompose(unicode.strip)
            loader.default_output_processor = Join()

            # iterate over fields and add xpaths to the loader
            for field, xpath in self.item_fields.iteritems():
                loader.add_xpath(field, xpath)
            yield loader.load_item()

Answer 1

After hours looking for solution, I just realised the method used was the wrong one, that's why the spider didn't work. 经过数小时的寻找解决方案，我才意识到所使用的方法是错误的，这就是为什么蜘蛛无法工作的原因。

MySpider.py MySpider.py

#!/usr/bin/env python
#-*- coding: utf-8 -*-

from scrapy.spiders import Spider
(...)

class ListeCourse(Spider):
(...)

Answer 2

I don't see any rule calling parse_item 我看不到任何调用parse_item规则

You should be using Spider and not CrawlSpider for your class. 您应该在课堂上使用Spider而不是CrawlSpider 。 Change 更改

class ListeCourse(CrawlSpider):

to 至

class ListeCourse(Spider):

粗糙的蜘蛛不写在Postgres

问题描述

2 个解决方案

解决方案1
0 2017-10-24 08:37:37

解决方案2
0 已采纳 2017-10-24 08:40:56

粗糙的蜘蛛不写在Postgres

问题描述

2 个解决方案

解决方案1 0 2017-10-24 08:37:37

解决方案2 0 已采纳 2017-10-24 08:40:56

解决方案1
0 2017-10-24 08:37:37

解决方案2
0 已采纳 2017-10-24 08:40:56