[英]very simple scrapy crawler not following links
這是一個簡單的scrapy
蜘蛛,可抓取yelp.com並獲取數據
我已經設置了Rule(LinkExtractor(allow=('.*')),follow=True,callback="parseBusiness")
將鏈接和回調作為parseBusiness
但是,此處的“ Scrapy” 不遵循鏈接
這是特定的輸出(完整輸出在此處http://pastebin.com/BkuErvMq )
2015-07-14 01:06:22 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-07-14 01:06:25 [scrapy] DEBUG: Crawled (200) <GET http://www.yelp.com/search?find_desc=Hotels&find_loc=San+Francisco%2C+CA&ns=1> (referer: None)
2015-07-14 01:06:26 [scrapy] DEBUG: Crawled (200) <GET http://www.yelp.com/biz/ucsf-medical-center-at-mount-zion-san-francisco> (referer: None)
2015-07-14 01:06:26 [scrapy] INFO: Closing spider (finished)
2015-07-14 01:06:26 [scrapy] INFO: Dumping Scrapy stats:
這是我的下面的代碼
import sys
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class Business(scrapy.Item):
name = scrapy.Field()
contactNumber = scrapy.Field()
address = scrapy.Field()
class YelpSpider(CrawlSpider):
name = "yelp"
allowed_domains = ["www.yelp.com"]
start_urls = [
"http://www.yelp.com/search?find_desc=Hotels&find_loc=San+Francisco%2C+CA&ns=1",
"http://www.yelp.com/biz/ucsf-medical-center-at-mount-zion-san-francisco"
]
Rule(LinkExtractor(allow=()),follow=True,callback="parseBusiness")
def parseBusiness(self, response):
business = Business()
business['name'] = stripchars(response.xpath('//h1[@itemprop="name"]//text()').extract())
business['contactNumber'] = stripchars(response.xpath('//span[@itemprop="telephone"]//text()').extract())
business['address'] = stripchars(response.xpath('//li[@class="address"]//text()').extract())
yield business
我在這里想念什么? 抓緊所有鏈接
您沒有設置蜘蛛的rules
屬性:
class YelpSpider(CrawlSpider):
name = "yelp"
allowed_domains = ["www.yelp.com"]
start_urls = [
"http://www.yelp.com/search?find_desc=Hotels&find_loc=San+Francisco%2C+CA&ns=1",
"http://www.yelp.com/biz/ucsf-medical-center-at-mount-zion-san-francisco"
]
rules = [
Rule(LinkExtractor(allow=('.*')),follow=True,callback="parseBusiness")
]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.