I am trying to scrape data from the following webpage: https://skiplagged.com/flights/YTO/DXB/2020-08-21 .
The element I am trying to target is the following: div[@class='infinite-trip-list']//div[@class='span1 trip-duration']
This is a list that adds elements dynamically on user scroll. My target is to store these elements in a variable to extract the duration of each flight. So far, I am not able to do that and this is what I have tried after reading several Stackoverflow posts on such issues.
mylist = []
last = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(1) #let the page load
new = driver.execute_script("return document.body.scrollHeight")
infinite_list = driver.find_elements_by_xpath("//div[@class='infinite-trip-list']//div[@class='span1 trip-duration']")
for elem in infinite_list:
if elem not in mylist:
mylist.append(elem.text)
if new == last: #if new height is equal to last height then we have reached the end of the page so end the while loop
break
last = new #changing last's value to new
This is scrolling the page till the bottom and as a result I am only seeing the last 10 values appear. I am not able to write a piece of code that can possibly scroll and add only the new divs (elements) that are being added.
Try the below approach using Requets API way it is fast, reliable and less code is needed to get the desired output. I have fetched the API URL from the website to GET the result on the basis of search.
You can fetch more details by using the below script right now it is fetching prices, flight number, hop details at airport, duration etc.
def scrap_flights_details():
from_source = 'YTO'
to_destination = 'DXB'
depart_date = '2020-08-21'
return_date = ''
counts_adults = 1
counts_children = ''
API_URL = 'https://skiplagged.com/api/search.php?from=' + str(from_source) + '&to=' + str(to_destination) + '&depart=' + str(depart_date) + '&'\
'return=' + str(return_date) + '&format=v3&counts%5Badults%5D=' + str(counts_adults) + '&counts%5Bchildren%5D=' + str(counts_children)
print('URL created: ',API_URL)
flights_details = requests.get(API_URL,verify=False).json()
for flight_number in flights_details['itineraries']['outbound']:
print('-' * 100)
print('Flight Number : ',flight_number['flight'])
print('Flight Price : ', flight_number['one_way_price'])
number = flight_number['flight']
print('Flight Details : ',flights_details['flights'][number])
print('-' * 100)
scrap_flights_details()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.