简体   繁体   English

Scrapy 只返回页面的第一个结果

[英]Scrapy only returning first result from the page

I have the following spider, it is simple just trying to parsing the product title, url and price from the single category page.我有以下蜘蛛,它很简单,只是试图从单个类别页面解析产品标题、url 和价格。 But the problem is spider only getting first result from the page and that is it.但问题是蜘蛛只能从页面获得第一个结果,仅此而已。 I don't understand it can anyone explain this behavior.我不明白任何人都可以解释这种行为。

url: website to scrape url:要抓取的网站

Spider:蜘蛛:

import scrapy
from scrapy.crawler import CrawlerProcess


class VapeSpider(scrapy.Spider):
    name = "vape"

    # custom_settings = {
    #     "FEED_FORMAT": "csv",
    #     "FEED_URI": "vape.csv",
    #     "LOG_FILE": "vape.log",
    # }
    def start_requests(self):
        yield scrapy.Request(
            "https://buyeliquidonline.com.au/product-category/geek-vape/",
            callback=self.parse,
        )

    def parse(self, response):
        for prod in response.css("ul.products:nth-child(2)"):
            yield {
                "title": prod.css("h2.woocommerce-loop-product__title")
                .css("a::text")
                .get()
            }


process = CrawlerProcess()

process.crawl(VapeSpider)

process.start()

The problem was in css element selection.问题出在 css 元素选择中。 ul.products:nth-child(2) select entire selected page once. ul.products:nth-child(2) select 整个选定页面一次。 You need to select all containers lies on li tag.您需要将 select 所有容器放在li标签上。 So you need ul.products:nth-child(2) li then use for loop所以你需要ul.products:nth-child(2) li然后使用for loop

import scrapy
from scrapy.crawler import CrawlerProcess


class VapeSpider(scrapy.Spider):
    name = "vap"

    # custom_settings = {
    #     "FEED_FORMAT": "csv",
    #     "FEED_URI": "vape.csv",
    #     "LOG_FILE": "vape.log",
    # }
    def start_requests(self):
        yield scrapy.Request(
            "https://buyeliquidonline.com.au/product-category/geek-vape",
            callback=self.parse,
        )

    def parse(self, response):
        for prod in response.css("ul.products:nth-child(2) li"):
            yield {
                "title": prod.css("h2.woocommerce-loop-product__title").css("a::text").get()
            }


process = CrawlerProcess()

process.crawl(VapeSpider)

process.start()

Output: Output:

{'title': 'Geekvape Aegis Boost Empty Pod Cartridge 3.7ml'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Boost Pod Kit Luxury Edition 1500mah'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Hero Pod Kit 1200mah 4ml'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Hero Replacement Pod Cartridge'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Legend Kit With Z Sub Ohm Tank 5ml'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Max Starter Kit'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Solo 100W Starter Kit'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis X 200w Starter Kit W/ Zeus'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aero 5ml replacement glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Alpha 4ml  Replacement Glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Boost Replacement Coils'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Cerberus 5.5ml replacement glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape G Coil Zeus Tank'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Super Mesh Coils'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Wenax K1 Pod System'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Zeus replacement glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Zeus sub ohm tank'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape ZX RTA'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Wenax replacement pods'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM