[英]scraping e-commerce website using scrapy concept
我是這個scrapy概念的新手。 我為電子商務網站編寫了一個腳本,需要在該網站中抓取以下提到的詳細信息。 我遇到了這個腳本的問題。 請任何人幫助我擺脫這個問題。
網站: https : //savedbythedress.com/collections/maternity-tops
import scrapy
class DressSpider(scrapy.Spider):
name = 'dress'
allowed_domains = ['savedbythedress.com']
start_urls = ['https://savedbythedress.com/collections/maternity-tops']
def parse(self, response):
#scraped all product links
domain = "https://savedbythedress.com"
link_products = response.css('div[class="product-info-inner"] ::attr(href)').get()
for link in link_products:
product_link = domain + link
yield{
'product_link': product_link.css('div[class="product-info-inner"] ::attr(href)').get(),
}
yield scrapy.Request(url=product_link, callback=self.parse_contents)
def parse_contents(self, response):
#scrape needed information
productlink = response.url
yield{
'product_title' : response.css('.sbtd-product-title ::text').get(),
'product_price' : response.css('.product-price ::text').get(),
'product_review' : response.css('.Natsob ::text').getall()
}
使用yield response.follow(page_url, self.parse_contents)
它將為您工作
import scrapy
class DressSpider(scrapy.Spider):
name = 'dress'
allowed_domains = ['savedbythedress.com']
start_urls = ['https://savedbythedress.com/collections/maternity-tops']
def parse(self, response):
#scraped all product links
domain = "https://savedbythedress.com"
# link_products = response.css('div[class="product-info-inner"] ::attr(href)').get()
for link in response.css('div.product-info'):
page_url = link.css('div[class="product-info-inner"] ::attr(href)').get()
print('PAGE URL IS ', page_url)
yield response.follow(page_url, self.parse_contents)
# product_link = domain + link
# yield{
# 'product_link': link.css('div[class="product-info-inner"] ::attr(href)').get(),
# }
print(page_url)
# yield scrapy.Request(response.follow(page_url), callback=self.parse_contents)
def parse_contents(self, response):
print()
#scrape needed information
print(response.url)
productlink = response.url
yield{
'product_title' : response.css('.sbtd-product-title ::text').get(),
'product_price' : response.css('.product-price ::text').get(),
'product_review' : response.css('.Natsob ::text').getall()
}
@Thinesh,這是有效的解決方案:
import scrapy
class DressSpider(scrapy.Spider):
name = 'dress'
allowed_domains = ['savedbythedress.com']
start_urls = ['https://savedbythedress.com/collections/maternity-tops']
def parse(self, response):
for product in response.css('div.product-index.desktop-3.tablet-half.mobile-half'):
product_url = product.css('div.product-info>div>div>a::attr(href)').get()
abs_url = f'https://savedbythedress.com{product_url}'
yield {
'product_title': product.css('div.product-info-inner> div > a >span ::text').get(),
'product_price': product.css('div.prod-price::text').get(),
'url':abs_url}
輸出:
{'product_title': 'Heather Gray Maternity Top with Long Sleeve and Elbow Patches', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/heather-gray-maternity-top-with-long-sleeve-and-elbow-patches-11'}2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Beige Maternity Top with Long Sleeves', 'product_price': '$ 39.00', 'url': 'https://savedbythedress.com/products/beige-maternity-top-with-long-sleeves-utm_campaign-la1'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Charcoal Long Sleeve Maternity Nursing Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/charcoal-long-sleeve-maternity-nursing-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Heather Gray Mock Neck Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/heather-gray-mock-neck-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Burgundy Maternity Top with Long Sleeve and Elbow Patches', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/burgundy-maternity-top-with-long-sleeve-and-elbow-patches-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Gray Ribbed Long Sleeve Maternity Top', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/gray-ribbed-long-sleeve-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Cream Mock Neck Maternity Top', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/cream-mock-neck-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Black and White Plaid Maternity Top', 'product_price':
'$ 39.00', 'url': 'https://savedbythedress.com/products/black-and-white-plaid-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Red Plaid Maternity Top', 'product_price': '$ 37.00', 'url': 'https://savedbythedress.com/products/red-plaid-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Heather Gray Long Sleeve Maternity Top', 'product_price': '$ 34.00', 'url': 'https://savedbythedress.com/products/heather-gray-long-sleeve-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Mint Ribbed Long Sleeve Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/mint-ribbed-long-sleeve-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'White Polka Dot Maternity Top', 'product_price': '$ 38.00', 'url': 'https://savedbythedress.com/products/white-polka-dot-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Black Printed Maternity Top', 'product_price': '$ 29.00', 'url': 'https://savedbythedress.com/products/black-printed-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Sage Ribbed Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/sage-ribbed-maternity-top-utm_campaign-la1'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Dusty Pink Ribbed Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/dusty-pink-ribbed-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Blue and White Off Shoulder Maternity Top', 'product_price': '$ 42.00', 'url': 'https://savedbythedress.com/products/blue-and-white-off-shoulder-maternity-top-11'}
2021-11-02 22:40:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://savedbythedress.com/collections/maternity-tops>
{'product_title': 'Blue Floral Maternity Top', 'product_price': '$ 30.00', 'url': 'https://savedbythedress.com/products/blue-floral-materrnity-top-11'}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.