Can't find page elements using Selenium python

Question

I am trying to extraxt the review text from this page .

Here's a condensed version of the html shown in my chrome browser inspector:

<div id="module_product_review" class="pdp-block module">
    <div class="lazyload-wrapper ">
        <div class="pdp-mod-review" data-spm="ratings_reviews" lazada_pdp_review="expose" itemid="1615006548" data-nosnippet="true" data-aplus-ae="x1_490e4591" data-spm-anchor-id="a2o42.pdp_revamp.0.ratings_reviews.508466b1OJjCoH">
            <div>...</div>
            <div>...</div>
            <div>
                <div class="mod-reviews">
                    <div class="item">
                        <div class="top">...</div>
                        <div class="middle">...</div>
                        <div class="item-content">
                            <div class="content" data-spm-anchor-id="a2o42.pdp_revamp.ratings_reviews.i3.508466b1OJjCoH">Slim and light. feel good. better if providing 16G version.</div>
                            <div class="review-image">...></div>
                            <div class="skuInfo">Color Family:MYSTIC SILVER</div>
                            <div class="bottom">...</div>
                            <div class="dialogs"></div>
                        </div>
                        <div class="seller-reply-wrapper">...</div>
                    <div class="item">...</div>
                    <div class="item">...</div>
                    <div class="item">...</div>
                    <div class="item">...</div>
                </div>
            </div>
        </div>
    </div>
</div>

I'm trying to extract the "Slim and light. feel good. better if providing 16G version." text from the class="content" element.

But when I try to retrieve the id="module_product_review" element using Selenium in python, this is what I get instead:

<div class="pdp-block module" id="module_product_review">
    <div class="lazyload-wrapper">
        <div class="lazy-load-placeholder">
            <div class="lazy-load-skeleton">
            </div>
        </div>
    </div>
</div>

This is my code:

op = webdriver.ChromeOptions()
op.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=op)
driver.get("https://www.lazada.sg/products/huawei-matebook-d14-laptop-14-fullview-display-intel-i5-processor-8gb512gb-intel-uhd-graphics-i1615006548-s7594078907.html?spm=a2o42.searchlist.list.3.15064828Od60kh&search=1&freeshipping=1")
module_product_review = driver.find_element(By.ID, "module_product_review")
html = module_product_review.get_attribute("outerHTML")
soup = BeautifulSoup(html, 'lxml')
print(soup.prettify())

I thought it might have been because I was retrieving the element before it was fully loaded, so I tried to sleep the program for 30 seconds before calling find_element() , but I still get the same result. As far as I can tell, it's not an issue of iframes or shadow roots either.

Is there some other issue that I'm missing?

Answer 1

The element you are trying to access and to get it's text is initially out of the visible view. You have first to scroll that element into the view.
Also, since you are working in headless mode you should set the window size. The default window size in headless mode is much smaller than we normally use.
And you should use expected conditions explicit waits to access the elements only when they are ready for that.
This should work better:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

op = webdriver.ChromeOptions()
op.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=op)
options.add_argument("window-size=1920,1080")
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
driver.get("https://www.lazada.sg/products/huawei-matebook-d14-laptop-14-fullview-display-intel-i5-processor-8gb512gb-intel-uhd-graphics-i1615006548-s7594078907.html?spm=a2o42.searchlist.list.3.15064828Od60kh&search=1&freeshipping=1")
element = wait.until(EC.presence_of_element_located((By.ID, "module_product_review")))
time.sleep(1)
actions.move_to_element(element).perform()
module_product_review = wait.until(EC.visibility_of_element_located((By.ID, "module_product_review")))  
#now you can do what you want here
html = module_product_review.get_attribute("outerHTML")

Also, in order to find that specific element and get that specific text you could use something more precise, like this:

your_text = wait.until(EC.visibility_of_element_located((By.XPATH, "(//div[@id='module_product_review']//div[@class='item']//div[@class='content'])[1]"))).text

You can use this after scrolling, as mentioned above

Can't find page elements using Selenium python

Question

1 answers

solution1
1 ACCPTED 2022-02-24 08:09:01

Can't find page elements using Selenium python

Question

1 answers

solution1 1 ACCPTED 2022-02-24 08:09:01

solution1
1 ACCPTED 2022-02-24 08:09:01