[英]Scrapy spider not writing in Postgres
I'm trying to scrap items from a several pages of a website to a postgres database. 我正在尝试将网站的多个页面中的项目剪贴到Postgres数据库中。 I tried different codes but still doesn't work, my database still empty... 我尝试了不同的代码,但仍然无法正常工作,我的数据库仍然为空...
How can I scrap items from pages of a website to my Postgres database ? 如何将网站页面中的项目剪贴到Postgres数据库中? What is wrong with my code ? 我的代码有什么问题?
I show you the latest version of the code : 我向您展示代码的最新版本:
Myspider.py Myspider.py
#!/usr/bin/env python
#-*- coding: utf-8 -*-
import scrapy, os, re, csv
from scrapy.spiders import CrawlSpider, Rule, Spider
from scrapy.linkextractors import LinkExtractor
from scrapy.selector import Selector
from scrapy.loader import ItemLoader
from scrapy.loader.processors import Join, MapCompose
from scrapy.item import Item, Field
from AHOTU_V2.items import AhotuV2Item
def url_lister():
url_list = []
page_count = 0
while page_count < 10:
url = 'https://marathons.ahotu.fr/calendrier/?page=%s' %page_count
url_list.append(url)
page_count += 1
return url_list
class ListeCourse(CrawlSpider):
name = 'ListeCAP_Marathons_ahotu'
start_urls = url_lister()
deals_list_xpath='//div[@class="list-group col-sm-12 top-buffer"]/a[@class="list-group-item calendar"]'
item_fields = AhotuV2Item()
item_fields = {
'nom_course': './/dl/dd[3]/text()',
'localisation' :'.//dl/dd[2]/span[1]/text()',
}
def parse_item(self, response):
selector = Selector(response)
# iterate over deals
for deal in selector.xpath(self.deals_list_xpath):
loader = ItemLoader(AhotuV2Item(), selector=deal)
# define processors
loader.default_input_processor = MapCompose(unicode.strip)
loader.default_output_processor = Join()
# iterate over fields and add xpaths to the loader
for field, xpath in self.item_fields.iteritems():
loader.add_xpath(field, xpath)
yield loader.load_item()
After hours looking for solution, I just realised the method used was the wrong one, that's why the spider didn't work. 经过数小时的寻找解决方案,我才意识到所使用的方法是错误的,这就是为什么蜘蛛无法工作的原因。
MySpider.py MySpider.py
#!/usr/bin/env python
#-*- coding: utf-8 -*-
from scrapy.spiders import Spider
(...)
class ListeCourse(Spider):
(...)
I don't see any rule calling parse_item
我看不到任何调用parse_item
规则
You should be using Spider
and not CrawlSpider
for your class. 您应该在课堂上使用Spider
而不是CrawlSpider
。 Change 更改
class ListeCourse(CrawlSpider):
to 至
class ListeCourse(Spider):
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.