简体   繁体   中英

Finding CSS selectors for a specific website

I have been trying to scrape product names and prices from this website ( https://dentalspeed.com/?fbclid=IwAR1_gjjWAevu1pgikjwLUqeFXzjBRo7A93uXFSIAasxlvl97ptEorNP1fDo ) but unfortunately i can't get CSS selectors right. I have also used CSS selector gadget. I also know html and css and i have read it myself. I think the css selectors are right but i just can't extract data for some reason.

   def parse(self, response):
      

        items = DenItem()
        all_div = response.css('div.collection-product')
       
        for div in all_div:
            product_name = div.css(".collection-product-name font font::text").extract()
            _new_price = div.css('div.collection-product-price > a > font > font::text').extract()  # .replace("Rs", "")
            _new_price = [s.replace("$", "") for s in _new_price]
            _new_price = [s.replace(",", "") for s in _new_price]
            _old_price = div.css("main#setembro section:nth-child(5) > div > div > div > div > ul > div.owl-wrapper-outer > div > div:nth-child(3) > li > div > div.collection-product-price-content > p.collection-product-price > del > font > font::text").extract()  # .replace("Rs", "")
            _old_price = [n.replace("R $", "") for n in _old_price]
            _old_price = [n.replace(",", "") for n in _old_price]
            items['product_name'] = product_name
            items['_new_price'] = _new_price
            items['_old_price'] = _old_price
            if len(items['_new_price']) == 0:
                items['_new_price'] = '0'
            if len(items['_old_price']) == 0:
                items['_old_price'] = '0'

            yield items

I find content dynamically returned from another url. You can find this in the network tab when refreshing the page with F5.

import requests

r = requests.get('https://dentalspeed.com/vitrines/app-vitrine__home--estetica').json()
print(r)

Depending on full list of products you want (you may need to track other urls)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM