[英]Scraping a website with scrapy
sHey I've just started using scrapy and was trying it out on a website "diy.com" but i cant seem to get the CrawlSpider to follow links or scrape any data. 嘿,我刚刚开始使用scrapy并正在网站“ diy.com”上进行尝试,但是我似乎无法让CrawlSpider跟踪链接或擦除任何数据。 I think it might be my regex but i cant see anything
我认为这可能是我的正则表达式,但是我什么也看不到
any help will be appreciated 任何帮助将不胜感激
from scrapy.spider import Spider
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.selector import Selector
from items import PartItem
class DIY_spider(CrawlSpider):
name = 'diy_cat'
allowed_domains = ['diy.com']
start_urls =[
"http://www.diy.com/nav/decor/tiles/wall-tiles"
]
rules = (
Rule(SgmlLinkExtractor(allow=(r'/(nav)/(decor)/(\w*)/(.*)(\d*)$', ),deny=(r'(.*)/(jsp)/(.*)')), callback='parse_item',follow = True),
def parse_items(self, response):
sel = Selector(response)
tests =[]
test = PartItem()
if sel.xpath('//*[@id="fullWidthContent"]/div[2]/dl/dd[1]/ul[1]/li[3]/text()') :
price = sel.xpath('//*[@id="fullWidthContent"]/div[2]/dl/dd[1]/ul[1]/li[3]/text()')
else:
price= sel.xpath('//dd[@class="item_cta"]/ul[@class="fright item_price"]/li/text()').extract()
if not price:
return test
return test
Your rule states parse_item
as the callback but the actual callback is named parse_items
. 您的规则将
parse_item
为回调,但实际的回调名为parse_items
。 Additionally, the indenting for the parse_items
function is incorrect, but that could simply be a formatting issue when pasting the code in. 此外,
parse_items
函数的缩进是不正确的,但是在粘贴代码时,这可能只是格式问题。
Besides, @Talvalin's note, you are not getting actual prices. 此外,@ Talvalin的笔记,您没有得到实际的价格。
Try this version of parse_item
: 试试这个版本的
parse_item
:
def parse_item(self, response):
sel = Selector(response)
price_list = sel.xpath('//span[@class="onlyPrice"]/text()').extract()
for price in price_list:
if price:
item = PartItem()
item['price'] = price
yield item
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.