簡體   English   中英

使用scrapy概念抓取電子商務網站

[英]scraping e-commerce website using scrapy concept

我是這個scrapy概念的新手。 我為電子商務網站編寫了一個腳本,需要在該網站中抓取以下提到的詳細信息。 我遇到了這個腳本的問題。 請任何人幫助我擺脫這個問題。

網站: https : //savedbythedress.com/collections/maternity-tops

import scrapy    

class DressSpider(scrapy.Spider):
    name = 'dress'
    allowed_domains = ['savedbythedress.com']
    start_urls = ['https://savedbythedress.com/collections/maternity-tops']

    def parse(self, response):
        #scraped all product links
        domain = "https://savedbythedress.com"
        link_products = response.css('div[class="product-info-inner"] ::attr(href)').get()
        for link in link_products:
            product_link = domain + link   
            yield{
                'product_link': product_link.css('div[class="product-info-inner"] ::attr(href)').get(),
            }      
            yield scrapy.Request(url=product_link, callback=self.parse_contents)

    def parse_contents(self, response):
        #scrape needed information
        productlink = response.url
        yield{
            'product_title' : response.css('.sbtd-product-title ::text').get(),
            'product_price' : response.css('.product-price ::text').get(),
            'product_review' : response.css('.Natsob ::text').getall()
        }




   

使用yield response.follow(page_url, self.parse_contents)它將為您工作

import scrapy    

class DressSpider(scrapy.Spider):
    name = 'dress'
    allowed_domains = ['savedbythedress.com']
    start_urls = ['https://savedbythedress.com/collections/maternity-tops']

    def parse(self, response):
        #scraped all product links
        domain = "https://savedbythedress.com"
        # link_products = response.css('div[class="product-info-inner"] ::attr(href)').get()
        for link in response.css('div.product-info'):
            page_url = link.css('div[class="product-info-inner"] ::attr(href)').get()
            print('PAGE URL IS ', page_url)
            yield response.follow(page_url, self.parse_contents)

            # product_link = domain + link   
            # yield{
            #     'product_link': link.css('div[class="product-info-inner"] ::attr(href)').get(),
            # }      
            print(page_url)
            # yield scrapy.Request(response.follow(page_url), callback=self.parse_contents)

    def parse_contents(self, response):
        print()
        #scrape needed information
        print(response.url)
        productlink = response.url
        yield{
            'product_title' : response.css('.sbtd-product-title ::text').get(),
            'product_price' : response.css('.product-price ::text').get(),
            'product_review' : response.css('.Natsob ::text').getall()
        }

@Thinesh,這是有效的解決方案:

import scrapy
class DressSpider(scrapy.Spider):
    name = 'dress'
    allowed_domains = ['savedbythedress.com']
    start_urls = ['https://savedbythedress.com/collections/maternity-tops']

    def parse(self, response):
        for product in response.css('div.product-index.desktop-3.tablet-half.mobile-half'):
            product_url = product.css('div.product-info>div>div>a::attr(href)').get()
            abs_url = f'https://savedbythedress.com{product_url}'
            yield {
                'product_title': product.css('div.product-info-inner> div > a >span ::text').get(),
                'product_price': product.css('div.prod-price::text').get(),
                'url':abs_url}

輸出:

{'product_title': 'Heather Gray Maternity Top with Long Sleeve and Elbow Patches', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/heather-gray-maternity-top-with-long-sleeve-and-elbow-patches-11'}2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Beige Maternity Top with Long Sleeves', 'product_price': '$ 39.00', 'url': 'https://savedbythedress.com/products/beige-maternity-top-with-long-sleeves-utm_campaign-la1'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Charcoal Long Sleeve Maternity Nursing Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/charcoal-long-sleeve-maternity-nursing-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Heather Gray Mock Neck Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/heather-gray-mock-neck-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Burgundy Maternity Top with Long Sleeve and Elbow Patches', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/burgundy-maternity-top-with-long-sleeve-and-elbow-patches-11'}        
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Gray Ribbed Long Sleeve Maternity Top', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/gray-ribbed-long-sleeve-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Cream Mock Neck Maternity Top', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/cream-mock-neck-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Black and White Plaid Maternity Top', 'product_price': 
'$ 39.00', 'url': 'https://savedbythedress.com/products/black-and-white-plaid-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Red Plaid Maternity Top', 'product_price': '$ 37.00', 'url': 'https://savedbythedress.com/products/red-plaid-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Heather Gray Long Sleeve Maternity Top', 'product_price': '$ 34.00', 'url': 'https://savedbythedress.com/products/heather-gray-long-sleeve-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Mint Ribbed Long Sleeve Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/mint-ribbed-long-sleeve-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'White Polka Dot Maternity Top', 'product_price': '$ 38.00', 'url': 'https://savedbythedress.com/products/white-polka-dot-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Black Printed Maternity Top', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/black-printed-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Sage Ribbed Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/sage-ribbed-maternity-top-utm_campaign-la1'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Dusty Pink Ribbed Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/dusty-pink-ribbed-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Blue and White Off Shoulder Maternity Top', 'product_price': '$ 42.00', 'url': 'https://savedbythedress.com/products/blue-and-white-off-shoulder-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Blue Floral Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/blue-floral-materrnity-top-11'}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM