繁体   English   中英

scrapy返回第一项

[英]scrapy returning first item

我正在为某人学习scrapy,因为它只返回页面上的第一项。 有人能告诉我我做错了什么吗?

以下是我的代码:

class RuvillaSpider(Spider):

    name = "RuvillaSpider"
    allowded_domains = ["ruvilla.com"]
    start_urls = ["https://www.ruvilla.com/men/footwear.html?dir=desc&limit=45&order=news_from_date"]

    def parse(self, response):
        products = Selector(response).xpath('//div[@class="category-products"]')

        if not products:
            raise CloseSpider('RuvillaSpider: DONE, NO MORE PAGES.')

        for product in products:
            item = RuvillaItem()
            item['name'] = product.xpath('ul/li/div/div[1]/a/@title').extract()[0]
            item['link'] = product.xpath('ul/li/div/div[1]/a/@href').extract()[0]
            item['image'] = product.xpath('ul/li/div/div[1]/a/img/@src').extract()[0]
            yield item

您的xpath似乎只返回1个product products变量。

尝试:

$ scrapy shell "https://www.ruvilla.com/men/footwear.html?dir=desc&limit=45&order=news_from_date"
In[1]: response.xpath('//div[@class="category-products"]')
Out[1]: [<Selector xpath='//div[@class="category-products"]' data=u'<div class="category-products">\n<div cla'>]

因此,似乎您的xpath不是针对每个单独的项目,而是针对项目所在的容器。要解决此问题,您需要生成一个选择每个产品容器的xpath:

def parse(self, response):
    products = Selector(response).xpath('//div[@class="category-products"]//li[contains(@class,"item")]')

    for product in products:
        item = dict()
        item['name'] = product.xpath('.//a/@title').extract_first()
        item['link'] = product.xpath('.//a/@href').extract_first()
        item['image'] = product.xpath('.//a/img/@src').extract_first()
        yield item
    next_page = response.xpath("//li[@class='current']/following-sibling::li[1]/a/@href").extract_first()
    if next_page:
        yield Request(next_page)

你的xpath错了。

使用此xpath:

( '// DIV [@类= “类别的产品”] / UL /利')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM