[英]I'm not able to scrape data
我正在使用scrapy從網站上抓取數據。 這是我的代碼
import scrapy
class ShopSpider(scrapy.Spider):
name = 'shop'
allowed_domains = ['https://www.shopclues.com/mobiles-smartphones.html?sort_by=bestsellers']
start_urls = ['http://https://www.shopclues.com/mobiles-smartphones.html?sort_by=bestsellers/']
custom_settings = {
'FEED_URI': 'tmp/shop.csv'
}
def parse(self, response):
titles = response.css('img::attr(title)').extract()
images = response.css('img::attr(data-img)').extract()
prices = response.css('.p_price::text').extract()
discounts = response.css('.prd_discount::text').extract()
for item in zip(titles, prices, images, discounts):
scraped_info = {
'title': item[0],
'price': item[1],
'image_urls': [item[2]], # Set's the url for scrapy to download images
'discount': item[3]
}
yield scraped_info
請檢查我哪里做錯了? 另外,我想在滾動時抓取所有數據。 那么它應該在我們滾動之前獲取所有數據嗎? 那么我該怎么做呢?
您有以下問題:
allowed_domain
(只需要域);start_urls
(http 兩次,最后是斜線);parse
函數中產生項目。在此處檢查固定代碼:
import scrapy
class ShopSpider(scrapy.Spider):
name = 'shop'
allowed_domains = ['shopclues.com']
start_urls = ['https://www.shopclues.com/mobiles-smartphones.html?sort_by=bestsellers']
def parse(self, response):
titles = response.css('img::attr(title)').extract()
images = response.css('img::attr(data-img)').extract()
prices = response.css('.p_price::text').extract()
discounts = response.css('.prd_discount::text').extract()
for item in zip(titles, prices, images, discounts):
scraped_info = {
'title': item[0],
'price': item[1],
'image_urls': [item[2]], # Set's the url for scrapy to download images
'discount': item[3]
}
yield scraped_info
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.