简体   繁体   中英

Extracting a date from webpage using selenium webdriver

For one of my projects, i am trying to run a looped script (for and while) which would run until a particular predefined date value is reached. My code is aimed at extracting the flight data of an aircraft between two specified dates. My source of information is the publicly available flight tracker, <flightradar24.com>, for which i also have a business subscription. For example - https://www.flightradar24.com/data/aircraft/d-abyt When trying to collect a list of flights from the page, i want to be able to read a date and stop the loop if it is over the specified date.

The html source looks like the following: 示例 In this case the 22 Mar 2020 is what I want to read and use to compare.

So far I've tried the following to try and extract the date.

element = driver.find_element_by_class_name('w40 hidden-xs hidden-sm')

print(driver.find_element_by_xpath("//time[@class='hidden-xs hidden-sm']").text)

print(driver.find_element_by_xpath("//time[@class='w40 hidden-xs hidden-sm']").get_attribute("data-time-format"))

and

element = driverfox.find_element_by_xpath('// time[ @class ="data-time-format"] / @ datetime'.__getattribute__("data-time-format"))

Thank you in advance for your advice !! 飞行雷达html代码

使用 Debanjan 的 Css 代码时发现错误。 Error when executing Debanjan's CSS suggestion.

Use the following to print out the element's text.

print(driver.find_element_by_css_selector('td.hidden-xs.hidden-sm').text)

Also induce webdriver waits

elem=WebDriverWait(driver, 10).until(EC.presence_of_element_located(((By.CSS_SELECTOR, "td.hidden-xs.hidden-sm"))).text
print(elem)

Import

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

To print the date ie 22 Mar 2020 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies :

  • Using CSS_SELECTOR and text attribute:

     print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tr.data-row[data-timestamp] td[data-timestamp][data-time-format='DD MMM YYYY']"))).text)
  • Using XPATH and get_attribute() :

     print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='data-row']//td[@data-timestamp and @data-time-format='DD MMM YYYY']"))).get_attribute("innerHTML"))
  • Note : You have to add the following imports:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

The problem is you're tryng to match with multiple class names. Xpath isn't great with this. Css selectors are which is why this works well:

driver.find_element_by_css_selector('td.hidden-xs.hidden-sm')

But you could also accomplish this with xpath such as:

driver.find_element_by_xpath("//td[contains(@class,'hidden-xs') and contains(@class, 'hidden-sm')]")

And, as always it's best to introduce a wait before locating an element:

element = WebDriverWait(driver, 10).until(
        EC.visibility_of_element_located((By.XPATH, "//td[contains(@class,'hidden-xs') and contains(@class, 'hidden-sm')]"))
    )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM