Question

I try to get data from this page https://octopart.com/electronic-parts/integrated-circuits-ics but from the Specs button. I try to get the names of the products with this code, but it doesn't work.

class SpecSpider(scrapy.Spider):
name='specName'

start_urls = ['https://octopart.com/electronic-parts/integrated-circuits-ics']
custom_settings = {
    'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
}

def parse(self,response):

    return FormRequest.from_response(response, formxpath="//form[@class='btn-group']", clickdata={"value":"serp-grid"}, callback = self.scrape_pages)

def scrape_pages(self, response):
    #open_in_browser(response)
    items = SpecItem() 

    for product in response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']"):

        name = product.xpath(".//tr/td[class='matrix-col-part']/a[class='nowrap']/text()").extract()            
        items['ProductName']=''.join(name).strip()

        price = product.xpath("//tr/td['4']/div[class='small']/text()").extract()
        items['Price'] = ''.join(price).strip()



        yield items

This xpath response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']") doesn't work.

Any suggestions

Answer 1

You are using wrong XPATH syntax!

//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']

The correct format is to add "@" before "class"

//div[@class='inner-body']/div[@class='serp-wrap-all']/..

And there is no 'matrix-table' table in above link.

Try using something like:

//div[@class='inner-body']/div[@class='serp-wrap-all']//*[contains(@class,'matrix-table')]

Answer 2

If you want just the top level product name use css selector of

.serp-card-pdp-link

and extract the text

The median price comes from css selector

.avg-price-faux-btn

You can apply css with scrapy using .css(selector)

Question

2 answers

solution1
1 2019-03-19 10:49:14

solution2
0 2019-03-18 07:44:06

Question

2 answers

solution1 1 2019-03-19 10:49:14

solution2 0 2019-03-18 07:44:06

solution1
1 2019-03-19 10:49:14

solution2
0 2019-03-18 07:44:06