python scrape flight and price data from skyscanner

Question

I am trying to get the price data from the following url. However I can only seem to get the text from 'div's down to a certain level, here is my code:

from selenium import webdriver
from bs4 import BeautifulSoup

def scrape_flight_prices(URL):

    browser = webdriver.PhantomJS()
    # PARSE THE HTML
    browser.get(URL)
    soup = BeautifulSoup(browser.page_source, "lxml")
    page_divs = soup.findAll("div", attrs={'id':'app-root'}) 
    for p in page_divs:
        print(p)

if __name__ == '__main__':
  URL1="https://www.skyscanner.net/transport/flights/brs/gnb/190216/190223/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=1&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=home#results"

And here is the output:

<div id="app-root">
<section class="day-content state-loading state-no-results" id="daysection">
<div class="day-searching">
<div class="hot-spinner medium"></div>
<div class="day-searching-message">Searching</div>
</div>
</section>
</div>

The section of html I want to scrape from looks like this:

https://www.skyscanner.net/transport/flights/brs/gnb/190216/190223/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=1&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=home#results

However when I try and scrape with the following code:

prices = soup.findAll("a", attrs={'target':"_blank", "data-e2e":"itinerary-price", "class":"CTASection__price-2bc7h price"})  
for p in prices:
    print(p)

It prints nothing! I suspect a js script is running something to generate the rest of the the code and/or data? Can anyone help me extract the data? Specifically I am trying to get the price, flight times, airline name etc but if beautiful soup is not printing the relevant html from the page then I'm not sure how else to get it?

Would appreciate any pointers! Many thanks in advance!

Answer 1

Try below code to get prices:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC

prices = [price.text for price in wait(browser, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "price")))]
print(prices)

python scrape flight and price data from skyscanner

Question

1 answers

solution1
2 ACCPTED 2018-10-27 21:19:34

python scrape flight and price data from skyscanner

Question

1 answers

solution1 2 ACCPTED 2018-10-27 21:19:34

solution1
2 ACCPTED 2018-10-27 21:19:34