使用Python中的Selenium从网页中的表格中提取序列中的链接

Question

I want to extract links of pdfs from this page using Selenium in python我想使用 python 中的 Selenium 从此页面提取 pdf 链接

I managed to extract the entire table that contains the rows and the links to the pdfs.我设法提取了包含行和 pdf 链接的整个表格。

driver.get(company_link)
announcement_link = driver.find_element(By.XPATH, '//*[@id="heading1"]/h1/a').get_attribute('href')
driver.get(announcement_link)
table = driver.find_element(By.XPATH, '//*[@id="lblann"]/table/tbody/tr[4]/td')

I am looking for a shortest possible method to create a list of all pdf links in a sequence.我正在寻找一种最短的方法来按顺序创建所有 pdf 链接的列表。 How do I do that?我怎么做？

Answer 1

I want to extract links of pdfs from this page using Selenium in python我想使用 python 中的 Selenium 从此页面提取 pdf 链接

In the page you provided, each link has a unique class tablebluelink which makes it easy to select all of their hrefs with a XPath expression selects the href attribute of all a elements that have a class attribute with the value tablebluelink :在您提供的页面中，每个链接都有一个唯一的 class tablebluelink ，这使得使用 XPath 表达式的 select 很容易选择所有具有class属性且值为tablebluelink a元素的href属性：

//a[@class='tablebluelink']/@href

and then use find_elements_by_xpath in order to iterate over them:然后使用find_elements_by_xpath来迭代它们：

elems = driver.find_elements_by_xpath("//a[@class='tablebluelink']/@href")

使用Python中的Selenium从网页中的表格中提取序列中的链接

问题描述

1 个解决方案

解决方案1
0 2023-01-22 01:29:46

使用Python中的Selenium从网页中的表格中提取序列中的链接

问题描述

1 个解决方案

解决方案1 0 2023-01-22 01:29:46

解决方案1
0 2023-01-22 01:29:46