简体   繁体   English

使用Python中的Selenium从网页中的表格中提取序列中的链接

[英]Extracting links in a sequence from a table in a webpage using Selenium in Python

I want to extract links of pdfs from this page using Selenium in python我想使用 python 中的 Selenium 从此页面提取 pdf 链接

I managed to extract the entire table that contains the rows and the links to the pdfs.我设法提取了包含行和 pdf 链接的整个表格。

driver.get(company_link)
announcement_link = driver.find_element(By.XPATH, '//*[@id="heading1"]/h1/a').get_attribute('href')
driver.get(announcement_link)
table = driver.find_element(By.XPATH, '//*[@id="lblann"]/table/tbody/tr[4]/td')

I am looking for a shortest possible method to create a list of all pdf links in a sequence.我正在寻找一种最短的方法来按顺序创建所有 pdf 链接的列表。 How do I do that?我怎么做?

I want to extract links of pdfs from this page using Selenium in python我想使用 python 中的 Selenium 从此页面提取 pdf 链接

In the page you provided, each link has a unique class tablebluelink which makes it easy to select all of their hrefs with a XPath expression selects the href attribute of all a elements that have a class attribute with the value tablebluelink :在您提供的页面中,每个链接都有一个唯一的 class tablebluelink ,这使得使用 XPath 表达式的 select 很容易选择所有具有class属性且值为tablebluelink a元素的href属性:

//a[@class='tablebluelink']/@href

and then use find_elements_by_xpath in order to iterate over them:然后使用find_elements_by_xpath来迭代它们:

elems = driver.find_elements_by_xpath("//a[@class='tablebluelink']/@href")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM