简体   繁体   中英

Forming XPATH Selector in SCRAPY

Trying to pull the product name from a page:

https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html

Can't find XPATH which returns useful, specific result.

Apologies for my first post being such a beginner question :(

class V12Spider(scrapy.Spider):
name = 'v12'
start_urls = ['https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html']


def parse(self, response):
    yield {
        'price' : response.xpath('//span[@id="product-price-26901"]/text()'),
        'name' : response.xpath('//h3[@class="product-name"]/a/text()'),
           }

for name , I expected to produce the name from items in h3 tags with class class product-name but generates multiple rows of data='\\r\\n

(whilst we're at it for price , is there any way to only pull the numerical values out?)

The problem you are facing can be solved using get() method for xpath and then using strip() method for string. I tried something like this

name= response.xpath('//h3[@class="product-name"]/a/text()').get()

Gives

'\r\n                                RED CHILLI VOLTAGE                            '

Then using

name.strip()

gives

'RED CHILLI VOLTAGE'

So you can replace your name statement with

name= response.xpath('//h3[@class="product-name"]/a/text()').get().strip()

Same solution to get price just add .get().strip at the end of your statement

Hopefully this helps. Also read about .getall() method from https://docs.scrapy.org/en/latest/topics/selectors.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM