I'm trying to scraping the hotel data from expedia. For example, scraping all the hotel link in Cavendish, Canada, from 01/01/2020 to 01/03/2020. But the problem now is I can only scrape 20 of them and it is actually contains 200+ for each place. The sample webpage and its url is like:
Scraping code:
import lxml
import re
import requests
from bs4 import BeautifulSoup
import xlwt
import pandas as pd
import numpy as np
url = 'https://www.expedia.com/Hotel-Search?adults=2&destination=Cavendish%20Beach%2C%20Cavendish%2C%20Prince%20Edward%20Island%2C%20Canada&endDate=01%2F03%2F2020&latLong=46.504395%2C-63.439669®ionId=6261119&rooms=1&sort=RECOMMENDED&startDate=01%2F01%2F2020'
header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36'}
res = requests.get(url,headers=header)
soup = BeautifulSoup(res.content,'lxml')
t1 = soup.select('a.listing__link.uitk-card-link')
So every link is stored in <a class='listing__link.uitk-card-link' href=xxxxxxx> </a>
inside <li></li>
, there is no differences between the html structure, can anyone explain this?
They are using API call to get next 20 records. There is no way to scrape the next 20 records.
Here is API details they are using when you click on "Show More"
They have API authentication to get data using API calls.
Note : Scraping works only when you don't have any ajax call and no authentication method.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.