[英]How to get a table with dynamic id using Selenium with Python
我正在尝试从此页面https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm和其他类似页面中获取表格。
有问题的表有一个动态 id table-XXXX
,其中 X 是每次页面加载时不同的数字。
该表具有以下属性:
class="tablesaw tablesaw-stack table-bordered table-centered rates-availability-table"
data-tablesaw-mode="stack"
我已经尝试了以下变体来定位这个表(已经查阅了这篇文章How to find element by part of its id name in selenium with python )但似乎没有任何效果。
find_elements_by_css_selector("[id*='tab']")
find_elements_by_css_selector("[class*='tablesaw']")
find_elements_by_css_selector("[data-tablesaw-mode*='stack']")
表WebElement是AJAX元素,因此要打印您必须为visibility_of_element_located()
诱导WebDriverWait的值,您可以使用以下任一定位器策略:
使用CSS_SELECTOR
:
driver.get('https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.tablesaw.tablesaw-stack.table-bordered.table-centered.rates-availability-table"))).text)
使用XPATH
:
driver.get('https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='tablesaw tablesaw-stack table-bordered table-centered rates-availability-table']"))).text)
控制台 Output:
Start Date End Date 3 Nights 4 Nights 5 Nights 6 Nights 7 Nights 28 Mar 2020 1 May 2020 £225 £300 £350 £410 £470 2 May 2020 26 Jun 2020 £250 £330 £400 £460 £530 27 Jun 2020 3 Jul 2020 - - - - £675 4 Jul 2020 10 Jul 2020 - - - - £920 11 Jul 2020 14 Aug 2020 - - - - £985 15 Aug 2020 21 Aug 2020 - - - - £920 22 Aug 2020 28 Aug 2020 - - - - £675 29 Aug 2020 31 Oct 2020 - - - - £470
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
数据通过 JavaScript 动态加载。 但是您可以使用他们的 API 来加载表。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm'
rates_url = 'https://www.holidayfrancedirect.co.uk/api/property-rates/{property_id}/2020'
property_id = url.split('/')[-2]
data = requests.get(rates_url.format(property_id=property_id)).json()
soup = BeautifulSoup(data['ratesHtml'], 'html.parser')
# print table to screen:
for tr in soup.select('tr'):
tds = [td.get_text(strip=True) for td in tr.select('td, th')]
print(('{:<15}'*7).format(*tds))
印刷:
Start Date End Date 3 Nights 4 Nights 5 Nights 6 Nights 7 Nights
28 Mar 2020 1 May 2020 £225 £300 £350 £410 £470
2 May 2020 26 Jun 2020 £250 £330 £400 £460 £530
27 Jun 2020 3 Jul 2020 - - - - £675
4 Jul 2020 10 Jul 2020 - - - - £920
11 Jul 2020 14 Aug 2020 - - - - £985
15 Aug 2020 21 Aug 2020 - - - - £920
22 Aug 2020 28 Aug 2020 - - - - £675
29 Aug 2020 31 Oct 2020 - - - - £470
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.