简体   繁体   中英

Extracting links in a sequence from a table in a webpage using Selenium in Python

I want to extract links of pdfs from this page using Selenium in python

I managed to extract the entire table that contains the rows and the links to the pdfs.

driver.get(company_link)
announcement_link = driver.find_element(By.XPATH, '//*[@id="heading1"]/h1/a').get_attribute('href')
driver.get(announcement_link)
table = driver.find_element(By.XPATH, '//*[@id="lblann"]/table/tbody/tr[4]/td')

I am looking for a shortest possible method to create a list of all pdf links in a sequence. How do I do that?

I want to extract links of pdfs from this page using Selenium in python

In the page you provided, each link has a unique class tablebluelink which makes it easy to select all of their hrefs with a XPath expression selects the href attribute of all a elements that have a class attribute with the value tablebluelink :

//a[@class='tablebluelink']/@href

and then use find_elements_by_xpath in order to iterate over them:

elems = driver.find_elements_by_xpath("//a[@class='tablebluelink']/@href")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM