简体   繁体   中英

Scrapy xpath is not working perfectly, return empty data

I am trying to scrape the book title price and author from vitalsource.com.

I successfully extracted the title, author and ISBN information but I can't get the price from the webpage.

I don't understand why I can't get the data since they are all on the same webpage.

I googled and tried many hours and now it's 4:43 am here, I am tired and despair, please help me.

please check the image for more detail. The xpath is working fine in the blue area, but not working in the red area

import scrapy
from VitalSource.items import VitalsourceItem
from scrapy.spiders import SitemapSpider

class VsSpider(scrapy.Spider):
    name = 'VS'
    allowed_domains = ['VitalSource.com']
    start_urls = ['https://www.vitalsource.com/products/cengage-unlimited-1st-edition-instant-access-1-cengage-unlimited-v9780357700006']

    def parse(self, response):
        item = VitalsourceItem()
        item['Ebook_Title'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/h1/text()').extract()[1].strip()
        item['Ebook_Author'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/p/text()').extract()[0].strip()
        item['Ebook_ISBN'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/ul/li[2]/h2/text()').extract()[0].strip()
        item['Ebook_Price'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/div/span[1]/span[3]/span[2]/text()')
        print(item)
        return item

Result Information:

{
 'Ebook_Author': 'by: Cengage Unlimited',
 'Ebook_ISBN': 'Print ISBN: \n 9780357700037, 0357700031',
 'Ebook_Price': [],
 'Ebook_Title': 'Cengage Unlimited, 1st Edition [Instant Access], 1 term (4 months)'
}

I am not sure if you want to strictly use xpath, but I will post how it's done both with xpath and css selector:

css:

response.css('.u-pull-sixth--right+ span::text').get().strip()

xpath:

response.xpath('/html[1]/body[1]/div[2]/main[1]/div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/span[1]').xpath('//span[@class]//span[2]/text()').get().strip()

Result:

{'Ebook_Price': '119.99'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM