Choosing appropriate locators when scraping dynamic content with Python and Selenium

Question

I am trying to understand the correct way to select specific elements of a webpage using python and selenium, I am uncertain what dictates which approach to take such as xpath or CSS and so on.

https://dutchie.com/embedded-menu/revolutionary-clinics-somerville/menu

 <a class="consumer-product-card__StyledLink-ncbvk2-1 jpGhIo" href="/embedded-menu/berkshire-roots/menu/cbd-tincture-2-1-225mg"> <span>CBD Tincture 2:1 225mg Details</span> <div class="product-card__Container-sc-7s6mw-0 iWHVJj"> <div class="product-card__Content-sc-7s6mw-1 cfcIOW"> <div class="product-information__Container-sc-65h5ke-0 ejVwks"> <img class="product-information__StyledProductImage-sc-65h5ke-1 jupjtQ" width="218" height="218" src="https://images.dutchie.com/0f253b35120facc1465b75b08bfd4d66?auto=format&amp;dpr=1&amp;bg=FFFFFF&amp;crop=faces&amp;fit=fill&amp;w=218&amp;h=218&amp;ixlib=react-7.2.0" alt="" srcset="https://images.dutchie.com/0f253b35120facc1465b75b08bfd4d66?auto=format&amp;dpr=2&amp;bg=FFFFFF&amp;crop=faces&amp;fit=fill&amp;w=218&amp;h=218&amp;ixlib=react-7.2.0 2x, https://images.dutchie.com/0f253b35120facc1465b75b08bfd4d66?auto=format&amp;dpr=3&amp;bg=FFFFFF&amp;crop=faces&amp;fit=fill&amp;w=218&amp;h=218&amp;ixlib=react-7.2.0 3x"> <div class="product-information__ProductInfo-sc-65h5ke-2 bwhblJ"> <div class="product-information__Price-sc-65h5ke-7 eEqLUB">$36.95</div> <div class="product-information__BrandContainer-sc-65h5ke-5 dlSlvE list-only"> <div class="product-information__Brand-sc-65h5ke-6 ftehWE">Berkshire Roots</div> </div> <div class="product-information__TitleContainer-sc-65h5ke-3 fOoVwz list-only false"> <div class="product-information__Title-sc-65h5ke-4 eBIyJW --line2">CBD Tincture 2:1 225mg</div> </div> <div class="product-information__TitleContainer-sc-65h5ke-3 fOoVwz mobile-and-card"> <div class="product-information__Title-sc-65h5ke-4 eBIyJW">CBD Tincture 2:1</div> <div class="product-information__Title-sc-65h5ke-4 eBIyJW --line2"> 225mg</div> </div> <div class="product-information__DetailsContainer-sc-65h5ke-9 ifqkuO"> <div class="product-information__Strain-sc-65h5ke-10 eWkod --high-cbd">High CBD</div> <div class="product-information__PotencyInfo-sc-65h5ke-14 gUReQf"><b>THC:&nbsp;</b>72.3 mg&nbsp;&nbsp;|&nbsp;&nbsp;<b>CBD:&nbsp;</b>160.3 mg</div> </div> </div> </div> <div class="product-weights__Container-nwgli1-0 gwUwAi"> <div class="product-weights__Weights-nwgli1-1 kiObrJ"> <div aria-label="Add 0.41g to cart for $36.95" data-cy="product-card-weight" class="weight__Container-sc-11f1l3-2 dNvnhd"> <div class="weight__Price-sc-11f1l3-4 ZtHqz">$36.95</div> <div class="weight__IconContainer-sc-11f1l3-1 zqIJt"> <svg xmlns="http://www.w3.org/2000/svg" width="11" height="11" viewBox="0 0 10 10"> <path fill="#A6ACB3" fill-rule="nonzero" d="M9.176 5c0-.407-.031-.723-.438-.723l-3.022.007.007-3.022c0-.407-.326-.428-.722-.438-.407 0-.723.03-.722.436l.003 3.012-3.022.007c-.406 0-.426.325-.436.722-.01.396.031.722.438.722l3.022-.007.003 3.012c0.407.326.427.723.438.407 0.722-.03.721-.437l-.003-3.011 3.012.003c.406 0.437-.315.436-.722z"></path> </svg> </div> </div> <div class="product-weights__Fill-nwgli1-2 dtfdkt"></div> </div> </div> </div> </div> </a>

How would I use a loop of sorts to access each and every "consumer-product-card" without having scrolled to the bottom of the page? Or would I need to force the page to scroll first? Is the "consumer-product-card" approach correct or would xpath make more sense? With either I find it difficult to understand which is ideal for what reason, or even how to select it in one instance, as well as the next and next until I reach the end.

Thank you.

Answer 1

This is kind of an opinionated question.

I would likely use the simplest CSS Selector I can find that uniquely defines the element. XPath is slower and, I find, likely more brittle and harder to find good selectors for elements. But there is no "correct" approach.

I'm a little confused regarding the goal of the rest of the question. I think we would need some more detail and the code you've used to attempt this.

Also, your HTML is formatted on one line and very hard to view.

Answer 2

To find all cards use:

driver.find_elements_by_xpath("//div[contains(@class,'consumer-product-card__InViewContainer-ncbvk2-0 dWfGpk')]")

Then use as an example links I gave you in the previous question.

UPDATE

Solution to start with:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')

driver.get('https://dutchie.com/embedded-menu/revolutionary-clinics-somerville/menu')

wait = WebDriverWait(driver, 30)
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".product-card__Content-sc-7s6mw-1.cfcIOW")))
cards = driver.find_elements_by_css_selector(".product-card__Content-sc-7s6mw-1.cfcIOW")

data = []
for card in cards:
    name = card.find_element_by_css_selector(".product-information__TitleContainer-sc-65h5ke-3.fOoVwz.list-only").text
    data.append(name)
for i in data:
    print(i)

It waits for cards and prints their names. But scrolling etc, other elements are completely different questions. I found css selectors more suitable for this case.

Result is three items:

Rick Simpson Oil (RSO)
Live Sugar - Purple Pineapple Express
Live Sugar - Gelato #33

Choosing appropriate locators when scraping dynamic content with Python and Selenium

Question

2 answers

solution1
1 2021-05-01 02:24:04

solution2
1 ACCPTED 2021-05-01 02:28:52

Choosing appropriate locators when scraping dynamic content with Python and Selenium

Question

2 answers

solution1 1 2021-05-01 02:24:04

solution2 1 ACCPTED 2021-05-01 02:28:52

solution1
1 2021-05-01 02:24:04

solution2
1 ACCPTED 2021-05-01 02:28:52