简体   繁体   中英

Extract a hyperlink from a website - Selenium

I was attempting to solve this issue for a bit of time and attempted multiple solution posted on here prior to opening this question.

I am currently attempting to a run a scraper with the following code

website = 'https://www.abitareco.it/nuove-costruzioni-milano.html'
path = Path().joinpath('util', 'chromedriver')
driver = webdriver.Chrome(path)
driver.get(website)

main = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, "p1")))

My goal hyperlink has word scheda in it:

i = driver.find_element_by_xpath('.//a[contains(@href, "scheda")]')
i.text

My first issue is that find_element_by_xpath only outputs a single hyperlink and second issue is that it is not extracting anything so far.

I'd appreciate any help and/or guidance.

You need to use find_elements instead :

for name in driver.find_elements(By.XPATH, ".//a[contains(@href, 'scheda')]"):
    print(name.text)

Note that find_elements will return a list of web elements, where as find_element return a single web element .

if you specifically looking for href attribute then you can try the below code :

for name in driver.find_elements(By.XPATH, ".//a[contains(@href, 'scheda')]"):
    print(name.get_attribute('href'))

There's 2 issues, looking at the website.

  1. You want to find all elements, not just one, so you need to use find_elements, not find_element
  2. The anchors actually don't have any text in them, so .text won't return anything.

Assuming what you want is to scrape the URLs of all these links, you can use .get_attribute('href') instead of .text, like so:

url_list = driver.find_elements(By.XPATH, './/a[contains(@href, "scheda")]')
for i in url_list:
    print(i.get_attribute('href'))

It will detect all webelements that match you criteria and store them in a list. I just used print as an example, but obviously you may want to do more than just print the links.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM