python/ xpath/ web-scraping/ scrapy

I try to get data from this page https://octopart.com/electronic-parts/integrated-circuits-ics but from the Specs button. I try to get the names of the products with this code, but it doesn't work.

class SpecSpider(scrapy.Spider):
name='specName'

start_urls = ['https://octopart.com/electronic-parts/integrated-circuits-ics']
custom_settings = {
    'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
}

def parse(self,response):

    return FormRequest.from_response(response, formxpath="//form[@class='btn-group']", clickdata={"value":"serp-grid"}, callback = self.scrape_pages)

def scrape_pages(self, response):
    #open_in_browser(response)
    items = SpecItem() 

    for product in response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']"):

        name = product.xpath(".//tr/td[class='matrix-col-part']/a[class='nowrap']/text()").extract()            
        items['ProductName']=''.join(name).strip()

        price = product.xpath("//tr/td['4']/div[class='small']/text()").extract()
        items['Price'] = ''.join(price).strip()



        yield items

This xpath response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']") doesn't work.

Any suggestions

You are using wrong XPATH syntax!

//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']

The correct format is to add "@" before "class"

//div[@class='inner-body']/div[@class='serp-wrap-all']/..

And there is no 'matrix-table' table in above link.

Try using something like:

//div[@class='inner-body']/div[@class='serp-wrap-all']//*[contains(@class,'matrix-table')]

If you want just the top level product name use css selector of

.serp-card-pdp-link

and extract the text

The median price comes from css selector

.avg-price-faux-btn

You can apply css with scrapy using .css(selector)

暂无
暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Xpath is correct but Scrapy doesn't work XPath doesn't point to the right HTML table elements with iterator (Scrapy) Why doesn't scrapy xpath function support the 'matches()' syntax? Python Scrapy : How to return nothing if xpath doesn't exist? IDLE doesn't recognise packages python scrapy xpath:InternalError: (1136, u“Column count doesn't match value count at row 1”) Scrapy XPath doesn't get all links in page while Chrome does Scrapy doesn't crawl Gunicorn doesn't recognise Wagtail CMS Numexpr in Python doesn't recognise a declared symbol
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM