繁体   English   中英

如何使用 Selenium 和 Python 获取具有动态 id 的表

[英]How to get a table with dynamic id using Selenium with Python

我正在尝试从此页面https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm和其他类似页面中获取表格。

有问题的表有一个动态 id table-XXXX ,其中 X 是每次页面加载时不同的数字。

该表具有以下属性:

class="tablesaw tablesaw-stack table-bordered table-centered rates-availability-table"

data-tablesaw-mode="stack"

我已经尝试了以下变体来定位这个表(已经查阅了这篇文章How to find element by part of its id name in selenium with python )但似乎没有任何效果。

find_elements_by_css_selector("[id*='tab']")

find_elements_by_css_selector("[class*='tablesaw']")

find_elements_by_css_selector("[data-tablesaw-mode*='stack']")

WebElementAJAX元素,因此要打印您必须为visibility_of_element_located()诱导WebDriverWait的值,您可以使用以下任一定位器策略

  • 使用CSS_SELECTOR

     driver.get('https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.tablesaw.tablesaw-stack.table-bordered.table-centered.rates-availability-table"))).text)
  • 使用XPATH

     driver.get('https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='tablesaw tablesaw-stack table-bordered table-centered rates-availability-table']"))).text)
  • 控制台 Output:

     Start Date End Date 3 Nights 4 Nights 5 Nights 6 Nights 7 Nights 28 Mar 2020 1 May 2020 £225 £300 £350 £410 £470 2 May 2020 26 Jun 2020 £250 £330 £400 £460 £530 27 Jun 2020 3 Jul 2020 - - - - £675 4 Jul 2020 10 Jul 2020 - - - - £920 11 Jul 2020 14 Aug 2020 - - - - £985 15 Aug 2020 21 Aug 2020 - - - - £920 22 Aug 2020 28 Aug 2020 - - - - £675 29 Aug 2020 31 Oct 2020 - - - - £470
  • 注意:您必须添加以下导入:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

数据通过 JavaScript 动态加载。 但是您可以使用他们的 API 来加载表。

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm'
rates_url = 'https://www.holidayfrancedirect.co.uk/api/property-rates/{property_id}/2020'
property_id = url.split('/')[-2]

data = requests.get(rates_url.format(property_id=property_id)).json()
soup = BeautifulSoup(data['ratesHtml'], 'html.parser')

# print table to screen:
for tr in soup.select('tr'):
    tds = [td.get_text(strip=True) for td in tr.select('td, th')]
    print(('{:<15}'*7).format(*tds))

印刷:

Start Date     End Date       3 Nights       4 Nights       5 Nights       6 Nights       7 Nights       
28 Mar 2020    1 May 2020     £225           £300           £350           £410           £470           
2 May 2020     26 Jun 2020    £250           £330           £400           £460           £530           
27 Jun 2020    3 Jul 2020     -              -              -              -              £675           
4 Jul 2020     10 Jul 2020    -              -              -              -              £920           
11 Jul 2020    14 Aug 2020    -              -              -              -              £985           
15 Aug 2020    21 Aug 2020    -              -              -              -              £920           
22 Aug 2020    28 Aug 2020    -              -              -              -              £675           
29 Aug 2020    31 Oct 2020    -              -              -              -              £470           

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM