简体   繁体   中英

Python - javascript web scraping with selenium does not work properly

I'm trying to scrape some data from one flight-searching web page. It is probably generated with Javascript. I've tried many approaches but nothing works so I've decided to try selenium .

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://www.pelikan.sk/sk/flights/list?dfc=CVIE%20BUD%20BTS&dtc=CMAD&rfc=CMAD&rtc=CVIE%20BUD%20BTS&dd=2015-07-09&rd=2015-07-14&px=1000&ns=0&prc=&rng=1&rbd=0&ct=0')
print driver.page_source

I though that it return final javascript-generated html code but I can't find there strings which are on this page when open it in browser.

Where could be the problem? What should I do to get those flights?

EDIT: I forgot to mention that the page is continualy loading new flights. So when you open it in a browser it shows some flights but it still loading other flights.

The page has quite a dynamic nature and you need to wait for the page to load . Choose something that would indicate that a page and search results were loaded. For instance, wait until the loading image (with a pelican) becomes invisible :

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get("https://www.pelikan.sk/sk/flights/list?dfc=CVIE%20BUD%20BTS&dtc=CMAD&rfc=CMAD&rtc=CVIE%20BUD%20BTS&dd=2015-07-09&rd=2015-07-14&px=1000&ns=0&prc=&rng=1&rbd=0&ct=0")

wait = WebDriverWait(driver, 60)
wait.until(EC.invisibility_of_element_located((By.XPATH, '//img[contains(@src, "loading")]')))
wait.until(EC.invisibility_of_element_located((By.XPATH, u'//div[. = "Poprosíme o trpezlivosť, hľadáme pre Vás ešte viac letov"]/preceding-sibling::img')))

print(driver.page_source)

Here we are waiting for two pelicans to fly away disappear: a bigger one and a smaller one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM