如何使用 Python 中的 Selenium 从在滚动上添加 div 的网页中抓取数据？

Question

I am trying to scrape data from the following webpage: https://skiplagged.com/flights/YTO/DXB/2020-08-21 .我正在尝试从以下网页抓取数据： https://skiplagged.com/flights/YTO/DXB/2020-08-21 。

The element I am trying to target is the following: div[@class='infinite-trip-list']//div[@class='span1 trip-duration']我试图定位的元素如下： div[@class='infinite-trip-list']//div[@class='span1 trip-duration']

This is a list that adds elements dynamically on user scroll.这是一个在用户滚动时动态添加元素的列表。 My target is to store these elements in a variable to extract the duration of each flight.我的目标是将这些元素存储在一个变量中以提取每次飞行的持续时间。 So far, I am not able to do that and this is what I have tried after reading several Stackoverflow posts on such issues.到目前为止，我无法做到这一点，这是我在阅读了几篇关于此类问题的 Stackoverflow 帖子后所尝试的。

mylist = []

last = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(1) #let the page load
    new = driver.execute_script("return document.body.scrollHeight")
    infinite_list = driver.find_elements_by_xpath("//div[@class='infinite-trip-list']//div[@class='span1 trip-duration']")
    for elem in infinite_list:
        if elem not in mylist:
            mylist.append(elem.text)
    if new == last: #if new height is equal to last height then we have reached the end of the page so end the while loop
        break
    last = new #changing last's value to new

This is scrolling the page till the bottom and as a result I am only seeing the last 10 values appear.这是将页面滚动到底部，因此我只看到最后 10 个值出现。 I am not able to write a piece of code that can possibly scroll and add only the new divs (elements) that are being added.我无法编写一段可能滚动并仅添加正在添加的新 div（元素）的代码。

Answer 1

Try the below approach using Requets API way it is fast, reliable and less code is needed to get the desired output.使用Requets API 尝试以下方法，它快速、可靠且需要更少的代码来获得所需的 output。 I have fetched the API URL from the website to GET the result on the basis of search.我已经从网站上获取了API URL以在搜索的基础上获取结果。

First i have created the dynamic URL.首先，我创建了动态 URL。 If you see the below script i have declare 6 variables to create API URL in the variables you can pass your search criteria like from, to, departure date, return date, no.如果您看到下面的脚本，我已经声明了 6 个变量来创建 API URL 在变量中您可以传递您的搜索条件，例如从、到、出发日期、返回日期、编号。 of adults or children.的成人或儿童。
After creating the URL requests method will ping the API URL to get the data and convert that data to JSON. After creating the URL requests method will ping the API URL to get the data and convert that data to JSON.
Finally first i'm fetching the flight numbers to get the details of that flight number like prices, duration and segments(basically HOP details like flight number, airlines name at different Airports with their time).最后，首先我要获取航班号以获取该航班号的详细信息，例如价格、持续时间和航段（基本上是 HOP 详细信息，例如航班号、不同机场的航空公司名称及其时间）。

You can fetch more details by using the below script right now it is fetching prices, flight number, hop details at airport, duration etc.您现在可以使用以下脚本获取更多详细信息，它正在获取价格、航班号、机场的跳点详细信息、持续时间等。

def scrap_flights_details():

from_source = 'YTO'
to_destination = 'DXB'
depart_date = '2020-08-21'
return_date = ''
counts_adults = 1
counts_children = ''

API_URL = 'https://skiplagged.com/api/search.php?from=' + str(from_source) + '&to=' + str(to_destination) + '&depart=' + str(depart_date) + '&'\
       'return=' + str(return_date) + '&format=v3&counts%5Badults%5D=' + str(counts_adults) + '&counts%5Bchildren%5D=' + str(counts_children)
print('URL created: ',API_URL)
flights_details = requests.get(API_URL,verify=False).json()

for flight_number in flights_details['itineraries']['outbound']:
    print('-' * 100)
    print('Flight Number : ',flight_number['flight'])
    print('Flight Price : ', flight_number['one_way_price'])
    number = flight_number['flight']
    print('Flight Details : ',flights_details['flights'][number])
    print('-' * 100) 

scrap_flights_details()

如何使用 Python 中的 Selenium 从在滚动上添加 div 的网页中抓取数据？

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-15 06:34:25

如何使用 Python 中的 Selenium 从在滚动上添加 div 的网页中抓取数据？

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-15 06:34:25

解决方案1
0 已采纳 2020-08-15 06:34:25