简体   繁体   中英

How to get all elements from webpage using Selenium?

My Python code is only finding the first article of the HTML, so it is printing the same link. How do I get all article tags from the HTML? Thanks.

Python code:

links = driver.find_elements_by_tag_name("article")
for i in links:
    if driver.find_element_by_xpath("//div[@class='inner-article']/a//div[@class='sold_out_tag']").get_attribute("innerHTML") == "sold out":
        print("sold out")
        link = ((driver.find_element_by_xpath("//div[@class='inner-article']/a").get_attribute("href")))
        print(link)
    else:
        print("available")
time.sleep(5)
driver.quit()

HTML:

<article>
   <div class="inner-article"><a style="height:81px;" 
   href="/shop/jackets/jly8dgwqu/w10m2pybx"><img width="81" height="81" 
   src="//d17ol771963kd3.cloudfront.net/139432/vi/AHP1l8fMIcA.jpg" 
   alt="Ahp1l8fmica"><div class="sold_out_tag">sold out</div></a></div>
</article>
<article>
   <div class="inner-article"><a style="height:81px;" 
   href="/shop/jackets/jly8dgwqu/w10m2pybx"><img width="81" height="81" 
   src="//d17ol771963kd3.cloudfront.net/139432/vi/AHP1l8fMIcA.jpg" 
   alt="Ahp1l8fmica"><div class="sold_out_tag">sold out</div></a></div>
</article>

To do this, you'll need a special maneuver by Selenium called Action Chains . You can import it at the top like so:

from selenium.webdriver.common.action_chains import ActionChains

Then proceed as follows:

articles = driver.find_elements_by_tag_name('article')
for article in articles:
    ActionChains(driver).move_to_element(article).perform()
    if article.find_element_by_tag_name('a').text == "sold out":
         print("sold out")
         link = article.find_element_by_xpath('div/a').get_attribute('href')
         print(link)
    else:
         print("available")

For each article web element, you can call the same methods from the driver to look in just that element. XPath's double slash ( // ) dictates that it searches the whole DOM without regards to any specific element (which is why it locates the same element each time), so you'd need to search it's direct children (ie. / ).

Edit: The element with the sold out text, by default, has a CSS property of display: none; . The only way to trigger the sold out text is by doing a mouseover each element. Luckily, Selenium has this capability too. I've also changed my original code a bit, items that aren't sold out don't have a div with a "sold out tag" class, so that would lead to an error.

As per the HTML you have shared if you want to print the href s of the nodes with text sold out you can use the following code block :

links = driver.find_elements_by_tag_name("article")
for i in links:
    if "sold out" in i.find_elements_by_xpath("//div[@class='inner-article']/a//div[@class='sold_out_tag']").get_attribute("innerHTML") :
        print("sold out")
        print(i.find_element_by_xpath("//div[@class='inner-article']/a").get_attribute("href"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM